Evaluating the Utility of Indirectly Linked Federal Administrative Records for Nonresponse Bias Adjustment

Evaluating the Utility of Indirectly Linked Federal Administrative Records for Nonresponse Bias... Abstract Survey researchers are actively seeking powerful auxiliary data sources capable of correcting for possible nonresponse bias in survey estimates of the general population. While several auxiliary data options exist, concerns about their usefulness for addressing nonresponse bias remain. One underutilized—but potentially rich—source of auxiliary data for nonresponse bias adjustment is federal administrative records. While federal records are routinely used to study nonresponse in countries where it is possible to directly link them (via a unique identifier) to population-based samples, such records are not widely used for this purpose in countries which lack a unique identifier to facilitate direct linkage. In this article, we examine the utility of indirectly linked administrative data from a federal employment database for nonresponse bias adjustment in a general population survey in Germany. In short, we find that the linked administrative variables have stronger correlations with the substantive survey variables than do standard paradata variables and that incorporating linked administrative data in nonresponse weighting adjustments reduces relative nonresponse bias to a greater extent than paradata-only weighting adjustments. However, for the majority of weighted survey estimates, including the administrative variables in the weighting adjustment procedure has minimal impact on the point estimates and their variances. We conclude with a general discussion of these findings and comment on the logistical issues associated with this type of linkage relevant to survey practice. 1. INTRODUCTION There is increasing interest in identifying auxiliary data sources capable of measuring and mitigating the effects of nonresponse bias in surveys (Smith 2011). It is well-established that auxiliary data collected on both respondents and nonrespondents permit more detailed investigations into the impact of nonresponse bias than the response rate alone (e.g., Schouten, Shlomo, and Skinner 2011; Wagner 2012). Public calls for powerful auxiliary information capable of adjusting for nonresponse bias have been made by prominent survey methodologists, including Groves (2006) who asserts that “effective surveys require the designer to anticipate nonresponse and actively seek auxiliary data that can be used to reduce the effect of the covariance of response propensities and the survey variables” (p. 670) and Smith (2011) who, in summarizing a workshop on using auxiliary data for nonresponse adjustment, remarks that “Much research is needed on the augmenting of sample-frame data with information from other databases and sources” (p. 392). In this article, we consider the use of a relatively underused source of auxiliary data for nonresponse bias adjustment: federal administrative records. In particular, we consider the situation where federal administrative records cannot be directly linked to the survey sample via a unique identifier and, instead, must be indirectly linked using error-prone and nonunique identifiers. The utility of this approach for addressing nonresponse bias is assessed using a novel linked-data source in Germany described in Sakshaug, Antoni, and Sauckel (2017). Using these data, we evaluate associations between the linked administrative variables and survey variables and determine whether incorporating the linked administrative variables in weighting adjustments improves nonresponse bias reduction over paradata alone. 2. BACKGROUND 2.1 Auxiliary Data Sources The range of auxiliary data sources available for general population surveys is sparse (Olson 2013). Process-oriented paradata recorded at each call attempt are among the most commonly used auxiliary data sources (Kreuter 2013), but their ability to detect and adjust for nonresponse bias in specific survey estimates is limited. Little and Vartivarian (2005) show that the most effective auxiliary variables for nonresponse bias adjustment are those that are strongly correlated with both the survey variables and the response outcome. While process paradata tends to be moderately correlated with the response outcome, their correlations with the survey variables are weaker (Lin and Schaeffer 1995; Peytchev and Olson 2007; Kreuter and Kohler 2009; Kreuter, Olson, Wagner, Yan, Ezzati‐Rice et al. 2010; Sakshaug and Kreuter 2011). As Smith (2011) alternatively puts it, “Process paradata are much more likely to be related to general causes of nonresponse and much less likely to be related to specific substantive variables. Their value is more in predicting nonresponse in general” (p. 395). Other forms of paradata, such as interviewer observations about the sampled neighborhood (e.g., cleanliness, safety), household (e.g., appearance, size) and household members (e.g., demographic characteristics, proxy variables) are more likely to be associated with substantive survey variables, but in practice these associations tend to be modest (Diez Roux 2001; Peytchev and Olson 2007; West, Kreuter, and Trappmann 2014). Alternatively, there is considerable interest in linking sample auxiliary information from commercial databases to sampled households (Smith 2011). Studies report relatively high rates of linkage (generally greater than seventy percent) in general population samples; however, agreement rates and correlations between the commercial and survey variables vary widely (Raghunathan and Van Howyek 2008; DiSogra, Dennis, and Fahimi 2010; Pasek, Jang, Cobb, Dennis, and DiSogra 2014; Sinibaldi, Kreuter, and Trappmann 2014; West, Wagner, Hubbard, and Hu 2015). Further, commercial variables tend to yield only modest improvement in response propensity models with little impact on the resulting weighted estimates (West, Wagner, Hubbard, and Hu 2015). These studies also caution on the quality of commercial data, since they can have high rates of item missing data, outdated information about household occupants, and contents which are not standardized across local government boundaries (Smith and Kim 2013). 2.2 Linked Administrative Data as an Auxiliary Data Source In the present study, we examine a relatively underutilized auxiliary data source that overcomes some of the above limitations: federal administrative records. Such records possess unique qualities that make them a particularly appealing source of auxiliary data for population samples. While not designed for research purposes, federal administrative records often contain detailed and relatively up-to-date information on population members, including participation in government-sponsored programs (e.g., welfare, healthcare) and details regarding financial matters (e.g., taxable earnings, healthcare expenditures). Typically, federal records are longitudinal in nature and document important life course transitions that surveys attempt to measure (e.g., labor force participation, unemployment duration, benefit receipt). For these reasons, federal administrative records are commonly linked to survey respondents to supplement the collected interview data (e.g., Olson 1999; Antoni and Bethmann 2018; Korbmacher and Czaplicki 2013; Freedman, McGonagle, and Andreski 2014; Knies and Burton 2014; Mostafa 2016). Beyond using federal administrative records as a supplementary data source for analyzing survey respondents, their contents related to common survey topics make them a potentially promising source of auxiliary data for addressing nonresponse bias. Several countries (e.g., the Netherlands, Sweden, Finland) already make use of administrative data for this purpose. This is facilitated through population registers which are used as sampling frames for household surveys (Blom and Carlsson 1999; UNECE 2007; Wallgren and Wallgren 2007). These population registers contain unique personal identifiers through which various substantive administrative databases can be directly linked to the drawn sample. However, this situation is atypical in countries where population registers either do not exist or lack a unique identifier that facilitates direct linkage. Linking federal administrative data to general population samples must therefore be carried out using indirect linkage procedures (e.g., probabilistic linkage) that rely on nonunique and error-prone identifiers (e.g., first and last name, address; for an overview of record linkage error sources, the reader is referred to Sakshaug and Antoni 2017). Published case studies of indirect linkages between federal administrative records and general population samples (respondents and nonrespondents) are rare, and it is unclear whether such linkages are useful for addressing nonresponse bias (Bee, Gathright, and Meyer 2015; Sakshaug, Antoni, and Sauckel 2017). Two relevant outcomes associated with this type of linkage are the linkage rate—the proportion of sample units that can be successfully linked to the target administrative database—and how representative the linked cases are of the entire data file. In nearly all indirect linkage applications there is a failure to link a subset of units, either because the linkage criterion is not met or because the target database does not contain a record for every sample unit. In both cases, the failure to link all units introduces the potential for linkage bias. Bee, Gathright, and Meyer (2015) investigated these issues in a study conducted at the US Census Bureau where sampled addresses from the 2011 Current Population Survey (CPS) Annual Social and Economic Supplement were indirectly linked to 2010 federal tax records compiled from the Internal Revenue Service (IRS) 1040 form. Household linkage rates of seventy-nine and seventy-six percent were reported for respondents and nonrespondents, respectively. Some linkage biases were reported: for example, low-income households were linked at a lower rate than higher-income households, likely due to the fact that lower-income households are not required to file tax returns and are, thus, underrepresented in the record base. In Germany, Sakshaug, Antoni, and Sauckel (2017) report the results of an indirect linkage performed on a general population sample of individuals from the German Panel Study “Labour Market and Social Security (PASS)” to a federal employment database maintained by the Institute for Employment Research (IAB) of the Federal Employment Agency (BA). The authors report a linkage rate of about sixty percent under a strict linkage criterion and eighty percent under a more relaxed criterion with similar linkage rates between respondents and nonrespondents. Older age groups, self-employed, and civil servants, who are known to be underrepresented in the employment database, were linked at lower rates compared to their counterparts. 2.3 Linked Administrative Data for Addressing Nonresponse Bias The previously mentioned case studies highlight the difficulty in achieving one hundred percent linkage rates for general population samples. However, for the majority of sample cases that can be linked, a key question is whether the linked administrative data are useful for studying nonresponse bias. Bee, Gathright, and Meyer (2015) addressed this question by examining whether CPS respondents differed from nonrespondents with respect to key variables derived from the linked tax records. Most notably, they found very little evidence of nonresponse bias in the CPS with respect to adjusted gross income as reported in the filed 1040 form. In contrast, the number of dependents, receipt of income from certain sources, proportion of units with a married filer, and some demographic characteristics were reported as evidence of nonresponse bias. A further issue that is underexplored is whether indirectly linked administrative data are useful for nonresponse bias adjustment. The above literature review suggests that these data likely possess key properties amenable to bias correction, but whether they substantially 1) improve model fit and 2) reduce nonresponse bias relative to paradata alone is unclear. To investigate this issue, we make use of the linked PASS survey and federal employment database of the IAB, reported in Sakshaug, Antoni, and Sauckel (2017), to address the following research questions: To what extent are linked administrative data from a federal employment database correlated with substantive survey variables and the response outcome? How do these correlations compare to those involving standard paradata? Does the inclusion of linked administrative variables in nonresponse adjustment models substantially improve model fit over paradata-only adjustment models? Does the inclusion of linked administrative variables in nonresponse-adjustment weights reduce nonresponse bias to a greater extent than paradata-only adjustment weights? To what extent do weighted survey estimates differ depending on whether linked administrative data are included in the nonresponse weighting procedure? 3. DATA AND METHODS 3.1 Survey Data Source The PASS is a longitudinal study of households conducted annually by the IAB in Germany since 2006 (Trappmann, Beste, Bethmann, and Müller 2013). The study was designed to measure the economic and social circumstances of individuals and their households in the aftermath of the reorganization of the welfare and unemployment benefits system (the “Hartz-Reforms”), which included the introduction of a new means-tested benefit scheme coined Unemployment Benefit II (UB II; Möller and Walwei 2009). The PASS study is based on separate, independent samples of UB II benefit recipients and residents of the general population. The UB II sample is drawn from an administrative list of all UB II recipients, and the general population sample is drawn from population lists amassed from municipality registration offices; in Germany, resident registration is mandatory. Both samples are drawn using a stratified cluster sampling design with (approximately) equal probabilities of selection. Each sample is composed of named individuals and all members of the sampled person’s household starting from age fifteen are interviewed. A household-level interview is completed by the household member most knowledgeable about the household situation. Data collection is carried out using a sequential mixed-mode design involving computer-assisted personal and telephone interviewing. Full details about the PASS methodology are available in Trappmann, Beste, Bethmann, and Müller (2013). Building on Sakshaug, Antoni, and Sauckel (2017), we utilize the 2011 PASS wave five general population refreshment sample. The total drawn sample consists of 6,237 individuals, of which 1,540 completed the PASS interview for a response rate of 25.1 percent (Response Rate 1; AAPOR 2016). In evaluating the utility of the administrative data for nonresponse adjustment, we make use of eight household-level and eight person-level PASS survey variables. The household-level variables include household size, presence of child under fifteen years of age, household ownership, material deprivation item index and material deprivation activity index (Berg et al. 2012), net income (in Euros) in past month, household savings, and received UB II at least once since 2009. The person-level variables include age (in years), sex, foreign citizenship, currently employed, total number of employer changes in lifetime, if ever registered as regular unemployed, presence of officially recognized disability, and if currently receives statutory pension payments. These variables and their coding schemes are described in Appendix table A.1 of the online supplementary material. The variables were selected based on their popularity among data users and discussions with the PASS team. Additionally, we make use of the following paradata variables collected during the PASS recruitment: total number of contact attempts, case initially refused interview, refusal conversion was applied, case involved an interviewer switch, and case involved a mode switch. 3.2 Administrative Data Source The administrative data source linked to the PASS wave five general population refreshment sample is the employment database of the IAB. The IAB database is constructed from administrative processes rendered by the BA, including social security notifications submitted by employers regarding their employees and registered activities concerning unemployment, job search, and participation in active labor market programs (Jacobebbinghaus and Seth 2007; Antoni, Ganzer, and vom Berge 2016). The culmination of these processes results in a database that covers the majority of the working population in Germany. Underrepresented groups include civil servants, the self-employed, and homemakers, who are exempt from making social security contributions.1 In the following analyses, we consider eight administrative variables: received UB II at least once since 2009, total number of employment spells in lifetime, currently employed, ever received regular unemployment benefit, age, sex, average daily wage, and foreign citizenship. These variables are commonly used in economic studies utilizing the IAB database (Boockmann, Ammermüller, Zwick, and Maier 2007; Kreuter, Müller, and Trappmann 2010; Baumgarten 2013; Burr, Rauch, Rose, Tisch, and Tophoven 2015). We note that each of these administrative variables has a similar counterpart among the selected PASS survey variables. For instance, age, sex, and UB II receipt are closely measured in both data sources. We expect moderate to high correlations between some of these counterparts, but others (e.g., employment variables) are likely to have lower correspondence due to construct and data generation differences. For example, only employment spells that are subject to social security contributions are included in the administrative counts, whereas the self-reported counts can include a broader range of employment spells. We believe this situation is representative of survey practice as linked administrative variables are likely to have similarities with the collected survey variables, but are not designed to be perfect replacements for them. 3.3 Linkage Procedures and Evaluation Before describing the linkage procedures, we reiterate that the purpose of linking the IAB employment database to the PASS general population sample is to maximize the amount of auxiliary information available for both respondents and nonrespondents. In Germany, record linkages are subject to strict data protection regulations and often require authorization from survey participants (Federal Data Protection Act 2013). The PASS survey routinely links interview data—conditional on respondent consent—to the IAB employment database. However, our use of linkage differs in the sense that only the sample paradata are linked to the IAB database. That is, only process-oriented survey variables (e.g., sample disposition codes, number of call attempts), and no substantive survey variables, are linked to the administrative data. The IAB legal team confirmed that linking process-oriented variables does not require consent. Full details of the linkage procedure can be found in Sakshaug, Antoni, and Sauckel (2017). Here we provide a basic summary of the process. An indirect linkage procedure was performed on the 6,237 persons drawn from municipality records for the PASS wave five refreshment sample. Other members of the sampled person’s household were excluded from the linkage. The linkage procedure was based on eight non-unique linkage variables available in the sampling frame and IAB administrative data: first name, last name, zip code, city name, street name, house number, sex, and binary birth cohort (born before 1945 or after). Numerous preprocessing steps were implemented to standardize these variables. The actual linkage was carried out using three subsequent procedures: 1) deterministic linkage; 2) distance-based linkage; and 3) probabilistic linkage. The distance-based and probabilistic linkage procedures were performed using the Merge ToolBox (MTB) software package developed by the German Record Linkage Center.2 The deterministic linkage was performed using Stata. A match certainty index (MCI) was constructed to classify the strictness of a link. MCI values ranged from zero to seventeen with zero denoting a non-link and values one to seventeen denoting links obtained at thresholds of increasing strictness (i.e., higher values correspond to a higher certainty that a true link has been identified).3 We classify a link as having an MCI value between thirteen and seventeen. This is a slightly stricter range than the one used by Sakshaug, Antoni, and Sauckel (2017), who classified a restrictive link within the MCI range six to seventeen. The revised MCI range was adopted after inspecting the linkage distribution (provided in Appendix figure A.1 of the online supplementary material) and identifying a relatively sparse number of links for each step of the range seven to twelve. The revised MCI range, which in our view represents a more natural cut-off based on the empirical distribution, corresponds to an overall linkage rate of 58.6 percent or a total of 3,653 linked cases out of 6,237. A total of 875 respondents and 2,778 nonrespondents were linked with linkage rates of 56.8 and 60.3 percent, respectively. These 3,653 linked cases serve as the basis for answering the four research questions. 3.4 Statistical Analysis 3.4.1 RQ1: correlation between linked administrative variables and survey variables. RQ1 is addressed by calculating absolute Pearson correlation coefficients for the eight administrative variables paired with the sixteen survey variables for the 625 (out of 875) respondents who consented to linkage of their interview data with the IAB administrative data. In addition, correlations between the administrative variables and the response outcome are presented based on all linked cases irrespective of linkage consent (n = 3,653). For comparison, the same correlations are presented for the paradata variables to assess the strength of their linear relationship with the substantive survey variables and the response outcome. 3.4.2 RQ2: incorporating linked administrative data in survey response modes. RQ2 is addressed by fitting two logistic regression models with survey participation (1 = response; 0 = nonresponse) as the dependent variable. The first model conditions on the aforementioned paradata variables only and the second model conditions on the paradata and linked administrative variables. The impact of including the administrative variables in the response model is evaluated by assessing the statistical significance of the administrative data coefficients and looking for substantial improvement in model fit statistics, including McFadden’s Pseudo R2 and area under the ROC curve (AUC) relative to the paradata-only model. 3.4.3 RQ3: utility of administrative data for nonresponse bias reduction. RQ3 is addressed by assessing and comparing nonresponse bias for weighted estimates of the eight administrative variables. The weights are constructed using estimated response propensity scores (e.g., Brick and Kalton 1996) generated from the two regression models fitted from the previous analysis. For each model, the response propensity scores are generated, sorted from lowest to highest and divided into ten approximately equal-sized groups. The adjustment weight is calculated as the inverse of the average propensity score within each decile group. The same procedure is carried out using both regression models, yielding two sets of weights that vary in their level of covariate information: 1) paradata only; and 2) paradata and administrative data. Nonresponse bias for the estimated mean (or percentage) of each administrative variable is calculated by taking the difference between the (weighted or unweighted) estimate based on the linked respondents ( Y-r) and the estimate based on the linked respondents and nonrespondents ( Y-n): Nonresponsebias=Y-r-Y-n. A measure of absolute relative nonresponse bias (ARNB) is also reported which shows the amount of nonresponse bias in the linked respondent estimate relative to the linked sample (respondents and nonrespondents) estimate. Specifically, ARNB is calculated as: AbsoluterelativenonresponsebiasARNB=Y-r-Y-nY-n. 3.4.4 RQ4: impact of administrative data on weighted survey estimates. To answer RQ4, we apply the two sets of adjustment weights from the previous analysis to the sixteen survey variables and calculate the difference between the weighted estimates ( Y-r,wtd) and the unweighted ( Y-r,unwtd) estimates to compare the impact of both sets of weights on the survey estimates: Differencebetweenweightedandunweightedestimates=Y-r,wtd-Y-r,unwtd. Standard errors and coefficients of variation for each estimate are also reported to assess the impact of the weights on the variability of the estimates. A measure of absolute relative difference (ARD) is presented to quantify the magnitude of the difference between the weighted and unweighted point estimates in relation to the unweighted estimate: AbsoluterelativedifferenceARD=Y-r,wtd-Y-r,unwtdY-r,unwtd. Both ARNB and ARD measures are reported in percentage terms by multiplying the above formulae by 100. All analyses are performed using the survey functions in Stata 14.1. 4. RESULTS 4.1 RQ1: Correlation between Linked Administrative Variables and Survey Variables In the first analysis, we compare the linked administrative variables and paradata variables with respect to their strength of association with the survey variables. Absolute Pearson correlation coefficients for the eight administrative variables paired with the sixteen survey variables are provided in figure 1 (see Appendix table A.2 in the online supplementary material for a tabular version of the correlations). The absolute correlations span the full range from zero to almost one. The majority of the correlations are small: ninety-six (out of 128 possible correlations or seventy-five percent) lie in the range 0.00 to 0.20, twenty-two (or seventeen percent) lie between 0.20 and 0.40, and the remaining ten correlations (eight percent) exceed 0.40.4 As expected, the largest correlations are observed for administrative variables that have similar counterparts in the survey data (e.g., age [0.97], gender [0.99], UB II receipt since 2009 [0.77], ever registered as regular unemployed/received regular unemployment benefit [0.71]). Moderate to high correlations exist for several nonsimilar pairs of survey-administrative variables (e.g., employment status-average daily wage [0.64], household savings-UB II receipt [0.43], material deprivation activity index-UB II receipt [0.41]). Overall, the administrative variables that contribute to the greatest number of correlations with the survey variables of at least 0.20 are UB II receipt (eight), average daily wage (seven), ever received regular unemployment benefit (six), and age (five). Figure 1. View largeDownload slide Absolute Pearson Correlations of the Administrative Variables and Paradata Variables Paired with the Survey Variables and Response Outcome. Figure 1. View largeDownload slide Absolute Pearson Correlations of the Administrative Variables and Paradata Variables Paired with the Survey Variables and Response Outcome. For comparison, figure 1 also shows the correlations between the survey and paradata variables, which appear to be weaker than the survey-administrative variable pairs; the maximum correlation between the survey and paradata variables is 0.22 (a tabular version of all survey-paradata correlations is provided in Appendix table A.3 of the online supplementary material). This pattern persists even when the correlations between the similar survey-administrative variable pairs are ignored. Figure 1 also indicates that both paradata and administrative data are only weakly correlated with the response outcome; the maximum absolute correlations with the response outcome are 0.08 and 0.12 for the administrative data and paradata, respectively. From these analyses, we can confirm that the administrative data generally yields higher correlations with the substantive survey variables compared with the paradata, and that both paradata and administrative data are only weakly associated with the response outcome. 4.2 RQ2: Incorporating Linked Administrative Data in Survey Response Models Next, we turn to the question of whether adding linked administrative variables as covariates in nonresponse adjustment models improves model fit over models that make use of paradata only. Table 1 presents two logistic regression models of response. Model one is the reduced model with paradata covariates only, and model two is the full model, which includes both paradata and administrative covariates. Two of the five paradata variables are statistically significant in each model: the number of contact attempts is positively associated, and switching interviewers is negatively associated with response. The magnitude of the association (and standard errors) with the paradata variables does not change with the addition of the administrative variables, indicating a different type of relationship between these two sets of variables. In model two, only two of the eight administrative variables yield a statistically significant association with response: age and foreign citizenship. Older age groups (44–53 and 54 or older) and the missing age group are more likely to respond relative to the youngest age group (30 or younger). There is no statistically significant difference in response between foreign and German citizens, but individuals for whom foreign status is missing are significantly less likely to respond compared with foreign citizens. In terms of model fit, there is only minor improvement when the administrative covariates are added to the model: the Pseudo R2 increases from 0.27 to 0.29, and the area under the ROC curve (AUC) increases from 0.83 to 0.85. In summary, there is only slight evidence that adding administrative variables to the response model improves model fit. Table 1. Logistic Regression Models of Survey Response on Paradata and Administrative Variables (n = 3, 653) Model 1: Paradata only Model 2: Paradata + administrative data Coef. Std. Error Coef. Std. Error Paradata variables Case initially refused interview −0.15 0.22 −0.18 0.23 Refusal conversion was applied −0.42 0.35 −0.47 0.36 Case involved an interviewer switch −1.85*** 0.33 −1.93*** 0.33 Total number of contact attempts   1–2 REF REF REF REF   3–5 3.58*** 0.27 3.60*** 0.27   6–10 4.87*** 0.29 4.91*** 0.28   11 or more 5.01*** 0.34 5.11*** 0.33 Case involved a mode switch 0.20 0.34 0.18 0.34 Administrative variables Received UB II at least once since 2009 − − −0.23 0.15 Total number of employment spells in lifetime  0–2 − − REF REF  3–4 − − 0.07 0.17  5–7 − − 0.07 0.20  8 or more − − 0.16 0.21 Currently employed 0.29 1.12 Avg. daily wage (in Euros)  0 − − REF REF  1–32 − − −0.31 1.14  33–67 − − −0.38 1.16  68–100 − − −0.30 1.14  101 or higher − − −0.41 1.16 Ever received regular unemployment benefit − − −0.18 0.13 Age (in years)  30 or younger − − REF REF  31–43 − − 0.06 0.16  44–53 − − 0.36* 0.16  54 or older − − 0.76*** 0.20  Missing − − 9.04*** 1.51 Male −0.01 0.11 Foreign citizenship  Yes − − REF REF  No − − 0.48 0.31  Missing − − −8.12*** 1.50 Intercept −4.03*** 0.21 −4.66*** 0.46 Model fit statistics AUC 0.83 0.85 Pseudo R2 0.27 0.29 Model 1: Paradata only Model 2: Paradata + administrative data Coef. Std. Error Coef. Std. Error Paradata variables Case initially refused interview −0.15 0.22 −0.18 0.23 Refusal conversion was applied −0.42 0.35 −0.47 0.36 Case involved an interviewer switch −1.85*** 0.33 −1.93*** 0.33 Total number of contact attempts   1–2 REF REF REF REF   3–5 3.58*** 0.27 3.60*** 0.27   6–10 4.87*** 0.29 4.91*** 0.28   11 or more 5.01*** 0.34 5.11*** 0.33 Case involved a mode switch 0.20 0.34 0.18 0.34 Administrative variables Received UB II at least once since 2009 − − −0.23 0.15 Total number of employment spells in lifetime  0–2 − − REF REF  3–4 − − 0.07 0.17  5–7 − − 0.07 0.20  8 or more − − 0.16 0.21 Currently employed 0.29 1.12 Avg. daily wage (in Euros)  0 − − REF REF  1–32 − − −0.31 1.14  33–67 − − −0.38 1.16  68–100 − − −0.30 1.14  101 or higher − − −0.41 1.16 Ever received regular unemployment benefit − − −0.18 0.13 Age (in years)  30 or younger − − REF REF  31–43 − − 0.06 0.16  44–53 − − 0.36* 0.16  54 or older − − 0.76*** 0.20  Missing − − 9.04*** 1.51 Male −0.01 0.11 Foreign citizenship  Yes − − REF REF  No − − 0.48 0.31  Missing − − −8.12*** 1.50 Intercept −4.03*** 0.21 −4.66*** 0.46 Model fit statistics AUC 0.83 0.85 Pseudo R2 0.27 0.29 † p < 0.10; *p < 0.05; **p < 0.01; ***p < 0.001. Table 1. Logistic Regression Models of Survey Response on Paradata and Administrative Variables (n = 3, 653) Model 1: Paradata only Model 2: Paradata + administrative data Coef. Std. Error Coef. Std. Error Paradata variables Case initially refused interview −0.15 0.22 −0.18 0.23 Refusal conversion was applied −0.42 0.35 −0.47 0.36 Case involved an interviewer switch −1.85*** 0.33 −1.93*** 0.33 Total number of contact attempts   1–2 REF REF REF REF   3–5 3.58*** 0.27 3.60*** 0.27   6–10 4.87*** 0.29 4.91*** 0.28   11 or more 5.01*** 0.34 5.11*** 0.33 Case involved a mode switch 0.20 0.34 0.18 0.34 Administrative variables Received UB II at least once since 2009 − − −0.23 0.15 Total number of employment spells in lifetime  0–2 − − REF REF  3–4 − − 0.07 0.17  5–7 − − 0.07 0.20  8 or more − − 0.16 0.21 Currently employed 0.29 1.12 Avg. daily wage (in Euros)  0 − − REF REF  1–32 − − −0.31 1.14  33–67 − − −0.38 1.16  68–100 − − −0.30 1.14  101 or higher − − −0.41 1.16 Ever received regular unemployment benefit − − −0.18 0.13 Age (in years)  30 or younger − − REF REF  31–43 − − 0.06 0.16  44–53 − − 0.36* 0.16  54 or older − − 0.76*** 0.20  Missing − − 9.04*** 1.51 Male −0.01 0.11 Foreign citizenship  Yes − − REF REF  No − − 0.48 0.31  Missing − − −8.12*** 1.50 Intercept −4.03*** 0.21 −4.66*** 0.46 Model fit statistics AUC 0.83 0.85 Pseudo R2 0.27 0.29 Model 1: Paradata only Model 2: Paradata + administrative data Coef. Std. Error Coef. Std. Error Paradata variables Case initially refused interview −0.15 0.22 −0.18 0.23 Refusal conversion was applied −0.42 0.35 −0.47 0.36 Case involved an interviewer switch −1.85*** 0.33 −1.93*** 0.33 Total number of contact attempts   1–2 REF REF REF REF   3–5 3.58*** 0.27 3.60*** 0.27   6–10 4.87*** 0.29 4.91*** 0.28   11 or more 5.01*** 0.34 5.11*** 0.33 Case involved a mode switch 0.20 0.34 0.18 0.34 Administrative variables Received UB II at least once since 2009 − − −0.23 0.15 Total number of employment spells in lifetime  0–2 − − REF REF  3–4 − − 0.07 0.17  5–7 − − 0.07 0.20  8 or more − − 0.16 0.21 Currently employed 0.29 1.12 Avg. daily wage (in Euros)  0 − − REF REF  1–32 − − −0.31 1.14  33–67 − − −0.38 1.16  68–100 − − −0.30 1.14  101 or higher − − −0.41 1.16 Ever received regular unemployment benefit − − −0.18 0.13 Age (in years)  30 or younger − − REF REF  31–43 − − 0.06 0.16  44–53 − − 0.36* 0.16  54 or older − − 0.76*** 0.20  Missing − − 9.04*** 1.51 Male −0.01 0.11 Foreign citizenship  Yes − − REF REF  No − − 0.48 0.31  Missing − − −8.12*** 1.50 Intercept −4.03*** 0.21 −4.66*** 0.46 Model fit statistics AUC 0.83 0.85 Pseudo R2 0.27 0.29 † p < 0.10; *p < 0.05; **p < 0.01; ***p < 0.001. 4.3 RQ3: Utility of Administrative Data for Nonresponse Bias Reduction Here, we turn to the issue of whether including administrative variables in nonresponse weighting adjustments enhances nonresponse bias reduction. Table 2 shows estimates of nonresponse bias and absolute relative nonresponse bias (ARNB) for each of the eight administrative variables estimated under the alternative weighting schemes (paradata only versus paradata-administrative data). The rationale for assessing nonresponse bias on the administrative data rather than on the survey data is simply due to the availability of the administrative data for both respondents and nonrespondents and the lack of high-quality benchmark information for the survey data.5 For comparison, nonresponse biases for the unweighted estimates are also shown. We remind readers that these estimates are not based on the full PASS sample, but rather the subset of linked cases. Table 2. Estimates of Means/Percentages and Nonresponse Bias in Weighted and Unweighted Administrative Variables Linked respondents (n = 875) Nonresponse bias % Abs. relative nonresponse bias (ARNB) Administrative variables Linked sample (n =3, 653) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) UBII_2009 (%) 18.07 14.06 18.40 19.48 −4.01 0.33 1.41 22.19 1.83 7.80 EMPLOYED (%) 60.75 62.17 56.85 64.06 1.42 −3.90 3.31 2.34 6.42 5.45 EMP_SPELLS 5.30 5.28 5.28 5.42 −0.02 −0.02 0.12 0.32 0.36 2.25 UNEMP_BEN (%) 57.10 56.91 54.85 52.97 −0.19 −2.25 −4.13 0.33 3.94 7.23 AGE 43.20 45.32 45.58 44.37 2.11 2.37 1.17 4.89 5.49 2.70 FOREIGN (%) 10.54 6.74 6.96 9.05 −3.80 −3.58 −1.49 36.05 33.97 14.14 MALE (%) 52.70 53.71 51.68 51.63 1.01 −1.02 −1.07 1.92 1.94 2.03 WAGE 43.04 45.57 35.82 39.00 2.53 −7.22 −4.04 5.88 16.78 9.38 Average ARNB (%) – – – – – – – 9.24 8.84 6.37 Linked respondents (n = 875) Nonresponse bias % Abs. relative nonresponse bias (ARNB) Administrative variables Linked sample (n =3, 653) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) UBII_2009 (%) 18.07 14.06 18.40 19.48 −4.01 0.33 1.41 22.19 1.83 7.80 EMPLOYED (%) 60.75 62.17 56.85 64.06 1.42 −3.90 3.31 2.34 6.42 5.45 EMP_SPELLS 5.30 5.28 5.28 5.42 −0.02 −0.02 0.12 0.32 0.36 2.25 UNEMP_BEN (%) 57.10 56.91 54.85 52.97 −0.19 −2.25 −4.13 0.33 3.94 7.23 AGE 43.20 45.32 45.58 44.37 2.11 2.37 1.17 4.89 5.49 2.70 FOREIGN (%) 10.54 6.74 6.96 9.05 −3.80 −3.58 −1.49 36.05 33.97 14.14 MALE (%) 52.70 53.71 51.68 51.63 1.01 −1.02 −1.07 1.92 1.94 2.03 WAGE 43.04 45.57 35.82 39.00 2.53 −7.22 −4.04 5.88 16.78 9.38 Average ARNB (%) – – – – – – – 9.24 8.84 6.37 Table 2. Estimates of Means/Percentages and Nonresponse Bias in Weighted and Unweighted Administrative Variables Linked respondents (n = 875) Nonresponse bias % Abs. relative nonresponse bias (ARNB) Administrative variables Linked sample (n =3, 653) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) UBII_2009 (%) 18.07 14.06 18.40 19.48 −4.01 0.33 1.41 22.19 1.83 7.80 EMPLOYED (%) 60.75 62.17 56.85 64.06 1.42 −3.90 3.31 2.34 6.42 5.45 EMP_SPELLS 5.30 5.28 5.28 5.42 −0.02 −0.02 0.12 0.32 0.36 2.25 UNEMP_BEN (%) 57.10 56.91 54.85 52.97 −0.19 −2.25 −4.13 0.33 3.94 7.23 AGE 43.20 45.32 45.58 44.37 2.11 2.37 1.17 4.89 5.49 2.70 FOREIGN (%) 10.54 6.74 6.96 9.05 −3.80 −3.58 −1.49 36.05 33.97 14.14 MALE (%) 52.70 53.71 51.68 51.63 1.01 −1.02 −1.07 1.92 1.94 2.03 WAGE 43.04 45.57 35.82 39.00 2.53 −7.22 −4.04 5.88 16.78 9.38 Average ARNB (%) – – – – – – – 9.24 8.84 6.37 Linked respondents (n = 875) Nonresponse bias % Abs. relative nonresponse bias (ARNB) Administrative variables Linked sample (n =3, 653) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) UBII_2009 (%) 18.07 14.06 18.40 19.48 −4.01 0.33 1.41 22.19 1.83 7.80 EMPLOYED (%) 60.75 62.17 56.85 64.06 1.42 −3.90 3.31 2.34 6.42 5.45 EMP_SPELLS 5.30 5.28 5.28 5.42 −0.02 −0.02 0.12 0.32 0.36 2.25 UNEMP_BEN (%) 57.10 56.91 54.85 52.97 −0.19 −2.25 −4.13 0.33 3.94 7.23 AGE 43.20 45.32 45.58 44.37 2.11 2.37 1.17 4.89 5.49 2.70 FOREIGN (%) 10.54 6.74 6.96 9.05 −3.80 −3.58 −1.49 36.05 33.97 14.14 MALE (%) 52.70 53.71 51.68 51.63 1.01 −1.02 −1.07 1.92 1.94 2.03 WAGE 43.04 45.57 35.82 39.00 2.53 −7.22 −4.04 5.88 16.78 9.38 Average ARNB (%) – – – – – – – 9.24 8.84 6.37 The table shows, for example, that UB II receipt is underrepresented among the linked respondents: the unweighted percentage is 14.06 percent among the linked respondents and 18.07 percent among the linked respondents and nonrespondents, which yields a nonresponse bias of −4.01 percent and an ARNB of 22.19 percent—a moderately large bias. Both the paradata-only and combined paradata-administrative data weighting schemes succeed in reducing the ARNB to a more reasonable level—1.83 and 7.80 percent, respectively—with the paradata-only weights slightly outperforming the combined data weights. In contrast, the foreign citizenship variable, which has the largest ARNB overall, yields a substantial reduction in ARNB under the combined paradata-administrative data weighting scheme (from 36.05 to 14.14 percent) compared with only a minor reduction under the paradata-only weighting scheme (from 36.05 to 33.97 percent). In total, the combined paradata-administrative data weight outperforms the paradata-only weight for half of the estimates. However, for some of these estimates (e.g., currently employed, average daily wage), both sets of weights increase rather than decrease the level of bias. Overall, the average ARNB for the paradata-administrative data weighted estimates is about 2.5 percentage points smaller than the paradata-only case (6.37 versus 8.84 percent, respectively), which equates to a relative reduction of about twenty-eight percentage points due to the inclusion of the linked administrative data in the weighting process. 4.4 RQ4: Impact of Administrative Data on Weighted Survey Estimates The final analysis examines the impact that the linked administrative data have on the weighted estimates of the actual survey variables. Table 3 shows the weighted and unweighted estimates of the sixteen selected PASS survey variables and corresponding ninety-five percent confidence intervals (CIs). The weights are based on the full set of variables used in models one and two from table 1. It is readily apparent that the paradata-administrative data weighted estimates are similar to the paradata-only weighted estimates. Each of the ninety-five percent CIs overlap, indicating that including the administrative data in the weighting adjustment procedure does not substantially alter the survey estimates. This finding is further supported when examining the absolute relative difference (ARD) between the weighted and unweighted estimates; for the majority of the survey estimates, the ARD values under both weighting schemes differ by less than ten percentage points. Moreover, the average ARD values under each weighting scheme differ only by about five percentage points, suggesting that both weighting schemes have similar impact on the point estimates. Table 3. Estimates of Means/Percentages and Differences Between Weighted and Unweighted Household- and Person-Level Survey Variables Linked respondents (n = 875) Difference between weighted and unweighted estimates % Abs. relative difference (ARD) No weights Weighted (paradata) Weighted (paradata +admin data) Weighted (paradata) Weighted (paradata + admin data) Weighted (paradata) Weighted (paradata + admin data) Household-level variables Household size 2.52 2.05 2.16 −0.47 −0.36 18.76 14.44 (2.38, 2.66) (1.86, 2.24) (1.98, 2.33) Presence of child under 15 years of age (%) 22.53 16.45 18.31 −6.08 −4.22 26.99 18.73 (18.75, 26.83) (12.82, 20.85) (14.40, 22.99) Household ownership (%) 42.51 30.84 29.28 −11.67 −13.23 27.45 31.12 (37.60, 47.57) (24.74, 37.69) (23.47, 35.85) Material deprivation item index 0.71 0.99 0.87 0.28 0.16 39.66 22.50 (0.61, 0.82) (0.57, 1.15) (0.50, 1.25) Material deprivation activity index 2.97 3.21 2.97 0.23 −0.01 7.77 0.20 (2.78, 3.17) (2.68, 3.73) (2.36, 3.57) Net income (in Euros) in past month 2457.31 2038.11 2101.49 −419.20 −355.82 17.06 14.48 (2296.55, 2618.07) (1758.95, 2317.27) (1788.50, 2414.47) Household savings 4.11 3.61 3.67 −0.49 −0.44 11.99 10.69 (3.83, 4.38) (3.11, 4.11) (3.17, 4.16) Received UB II at least once since 2009 (%) 13.29 14.98 16.15 1.69 2.86 12.72 21.52 (10.55, 16.62) (8.95, 24.02) (10.57, 23.91) Person-level variables Age in years 45.11 47.39 43.28 2.28 −1.83 5.05 4.06 (43.84, 46.39) (44.00, 50.78) (40.360, 46.31) Female (%) 46.10 43.65 42.18 −2.45 −3.92 5.31 8.50 (42.33, 49.92) (35.83, 51.81) (34.78, 49.95) Foreign born (%) 13.17 12.46 15.37 −0.71 2.20 5.39 16.70 (9.84, 17.41) (7.67, 19.62) (7.94, 27.66) Currently employed (%) 51.54 39.74 42.10 −11.80 −9.44 22.89 18.32 (47.95, 55.12) (32.01, 48.01) (33.12, 51.63) Total number of employer changes in lifetime 3.16 3.37 3.25 0.22 0.10 6.84 3.04 (2.82, 3.49) (2.80, 3.95) (2.60, 3.91) Ever been registered as unemployed (%) 41.00 43.85 44.18 2.85 3.18 6.95 7.76 (34.81, 47.49) (32.98, 55.33) (32.71, 56.30) Has an officially recognized disability (%) 12.56 18.11 14.14 5.55 1.58 44.19 12.58 (10.19, 15.39) (11.74, 26.88) (9.85, 19.89) Currently receives statutory pension payments (%) 17.56 23.85 16.55 6.29 −1.01 35.82 5.75 (13.87, 21.98) (16.00, 34.00) (11.29, 23.62) Average ARD: Overall (%) – – – – – 18.43 13.15 Linked respondents (n = 875) Difference between weighted and unweighted estimates % Abs. relative difference (ARD) No weights Weighted (paradata) Weighted (paradata +admin data) Weighted (paradata) Weighted (paradata + admin data) Weighted (paradata) Weighted (paradata + admin data) Household-level variables Household size 2.52 2.05 2.16 −0.47 −0.36 18.76 14.44 (2.38, 2.66) (1.86, 2.24) (1.98, 2.33) Presence of child under 15 years of age (%) 22.53 16.45 18.31 −6.08 −4.22 26.99 18.73 (18.75, 26.83) (12.82, 20.85) (14.40, 22.99) Household ownership (%) 42.51 30.84 29.28 −11.67 −13.23 27.45 31.12 (37.60, 47.57) (24.74, 37.69) (23.47, 35.85) Material deprivation item index 0.71 0.99 0.87 0.28 0.16 39.66 22.50 (0.61, 0.82) (0.57, 1.15) (0.50, 1.25) Material deprivation activity index 2.97 3.21 2.97 0.23 −0.01 7.77 0.20 (2.78, 3.17) (2.68, 3.73) (2.36, 3.57) Net income (in Euros) in past month 2457.31 2038.11 2101.49 −419.20 −355.82 17.06 14.48 (2296.55, 2618.07) (1758.95, 2317.27) (1788.50, 2414.47) Household savings 4.11 3.61 3.67 −0.49 −0.44 11.99 10.69 (3.83, 4.38) (3.11, 4.11) (3.17, 4.16) Received UB II at least once since 2009 (%) 13.29 14.98 16.15 1.69 2.86 12.72 21.52 (10.55, 16.62) (8.95, 24.02) (10.57, 23.91) Person-level variables Age in years 45.11 47.39 43.28 2.28 −1.83 5.05 4.06 (43.84, 46.39) (44.00, 50.78) (40.360, 46.31) Female (%) 46.10 43.65 42.18 −2.45 −3.92 5.31 8.50 (42.33, 49.92) (35.83, 51.81) (34.78, 49.95) Foreign born (%) 13.17 12.46 15.37 −0.71 2.20 5.39 16.70 (9.84, 17.41) (7.67, 19.62) (7.94, 27.66) Currently employed (%) 51.54 39.74 42.10 −11.80 −9.44 22.89 18.32 (47.95, 55.12) (32.01, 48.01) (33.12, 51.63) Total number of employer changes in lifetime 3.16 3.37 3.25 0.22 0.10 6.84 3.04 (2.82, 3.49) (2.80, 3.95) (2.60, 3.91) Ever been registered as unemployed (%) 41.00 43.85 44.18 2.85 3.18 6.95 7.76 (34.81, 47.49) (32.98, 55.33) (32.71, 56.30) Has an officially recognized disability (%) 12.56 18.11 14.14 5.55 1.58 44.19 12.58 (10.19, 15.39) (11.74, 26.88) (9.85, 19.89) Currently receives statutory pension payments (%) 17.56 23.85 16.55 6.29 −1.01 35.82 5.75 (13.87, 21.98) (16.00, 34.00) (11.29, 23.62) Average ARD: Overall (%) – – – – – 18.43 13.15 Table 3. Estimates of Means/Percentages and Differences Between Weighted and Unweighted Household- and Person-Level Survey Variables Linked respondents (n = 875) Difference between weighted and unweighted estimates % Abs. relative difference (ARD) No weights Weighted (paradata) Weighted (paradata +admin data) Weighted (paradata) Weighted (paradata + admin data) Weighted (paradata) Weighted (paradata + admin data) Household-level variables Household size 2.52 2.05 2.16 −0.47 −0.36 18.76 14.44 (2.38, 2.66) (1.86, 2.24) (1.98, 2.33) Presence of child under 15 years of age (%) 22.53 16.45 18.31 −6.08 −4.22 26.99 18.73 (18.75, 26.83) (12.82, 20.85) (14.40, 22.99) Household ownership (%) 42.51 30.84 29.28 −11.67 −13.23 27.45 31.12 (37.60, 47.57) (24.74, 37.69) (23.47, 35.85) Material deprivation item index 0.71 0.99 0.87 0.28 0.16 39.66 22.50 (0.61, 0.82) (0.57, 1.15) (0.50, 1.25) Material deprivation activity index 2.97 3.21 2.97 0.23 −0.01 7.77 0.20 (2.78, 3.17) (2.68, 3.73) (2.36, 3.57) Net income (in Euros) in past month 2457.31 2038.11 2101.49 −419.20 −355.82 17.06 14.48 (2296.55, 2618.07) (1758.95, 2317.27) (1788.50, 2414.47) Household savings 4.11 3.61 3.67 −0.49 −0.44 11.99 10.69 (3.83, 4.38) (3.11, 4.11) (3.17, 4.16) Received UB II at least once since 2009 (%) 13.29 14.98 16.15 1.69 2.86 12.72 21.52 (10.55, 16.62) (8.95, 24.02) (10.57, 23.91) Person-level variables Age in years 45.11 47.39 43.28 2.28 −1.83 5.05 4.06 (43.84, 46.39) (44.00, 50.78) (40.360, 46.31) Female (%) 46.10 43.65 42.18 −2.45 −3.92 5.31 8.50 (42.33, 49.92) (35.83, 51.81) (34.78, 49.95) Foreign born (%) 13.17 12.46 15.37 −0.71 2.20 5.39 16.70 (9.84, 17.41) (7.67, 19.62) (7.94, 27.66) Currently employed (%) 51.54 39.74 42.10 −11.80 −9.44 22.89 18.32 (47.95, 55.12) (32.01, 48.01) (33.12, 51.63) Total number of employer changes in lifetime 3.16 3.37 3.25 0.22 0.10 6.84 3.04 (2.82, 3.49) (2.80, 3.95) (2.60, 3.91) Ever been registered as unemployed (%) 41.00 43.85 44.18 2.85 3.18 6.95 7.76 (34.81, 47.49) (32.98, 55.33) (32.71, 56.30) Has an officially recognized disability (%) 12.56 18.11 14.14 5.55 1.58 44.19 12.58 (10.19, 15.39) (11.74, 26.88) (9.85, 19.89) Currently receives statutory pension payments (%) 17.56 23.85 16.55 6.29 −1.01 35.82 5.75 (13.87, 21.98) (16.00, 34.00) (11.29, 23.62) Average ARD: Overall (%) – – – – – 18.43 13.15 Linked respondents (n = 875) Difference between weighted and unweighted estimates % Abs. relative difference (ARD) No weights Weighted (paradata) Weighted (paradata +admin data) Weighted (paradata) Weighted (paradata + admin data) Weighted (paradata) Weighted (paradata + admin data) Household-level variables Household size 2.52 2.05 2.16 −0.47 −0.36 18.76 14.44 (2.38, 2.66) (1.86, 2.24) (1.98, 2.33) Presence of child under 15 years of age (%) 22.53 16.45 18.31 −6.08 −4.22 26.99 18.73 (18.75, 26.83) (12.82, 20.85) (14.40, 22.99) Household ownership (%) 42.51 30.84 29.28 −11.67 −13.23 27.45 31.12 (37.60, 47.57) (24.74, 37.69) (23.47, 35.85) Material deprivation item index 0.71 0.99 0.87 0.28 0.16 39.66 22.50 (0.61, 0.82) (0.57, 1.15) (0.50, 1.25) Material deprivation activity index 2.97 3.21 2.97 0.23 −0.01 7.77 0.20 (2.78, 3.17) (2.68, 3.73) (2.36, 3.57) Net income (in Euros) in past month 2457.31 2038.11 2101.49 −419.20 −355.82 17.06 14.48 (2296.55, 2618.07) (1758.95, 2317.27) (1788.50, 2414.47) Household savings 4.11 3.61 3.67 −0.49 −0.44 11.99 10.69 (3.83, 4.38) (3.11, 4.11) (3.17, 4.16) Received UB II at least once since 2009 (%) 13.29 14.98 16.15 1.69 2.86 12.72 21.52 (10.55, 16.62) (8.95, 24.02) (10.57, 23.91) Person-level variables Age in years 45.11 47.39 43.28 2.28 −1.83 5.05 4.06 (43.84, 46.39) (44.00, 50.78) (40.360, 46.31) Female (%) 46.10 43.65 42.18 −2.45 −3.92 5.31 8.50 (42.33, 49.92) (35.83, 51.81) (34.78, 49.95) Foreign born (%) 13.17 12.46 15.37 −0.71 2.20 5.39 16.70 (9.84, 17.41) (7.67, 19.62) (7.94, 27.66) Currently employed (%) 51.54 39.74 42.10 −11.80 −9.44 22.89 18.32 (47.95, 55.12) (32.01, 48.01) (33.12, 51.63) Total number of employer changes in lifetime 3.16 3.37 3.25 0.22 0.10 6.84 3.04 (2.82, 3.49) (2.80, 3.95) (2.60, 3.91) Ever been registered as unemployed (%) 41.00 43.85 44.18 2.85 3.18 6.95 7.76 (34.81, 47.49) (32.98, 55.33) (32.71, 56.30) Has an officially recognized disability (%) 12.56 18.11 14.14 5.55 1.58 44.19 12.58 (10.19, 15.39) (11.74, 26.88) (9.85, 19.89) Currently receives statutory pension payments (%) 17.56 23.85 16.55 6.29 −1.01 35.82 5.75 (13.87, 21.98) (16.00, 34.00) (11.29, 23.62) Average ARD: Overall (%) – – – – – 18.43 13.15 A limitation of this analysis is the absence of benchmark data to validate whether any shifts in the weighted survey estimates reflect an actual reduction in nonresponse bias. We can only speculate on the possible impact of the weights by gleaning information about the direction of nonresponse bias observed for similarly-measured administrative variables studied in the previous section (section 4.3). In particular, we consider the two variables found to be most affected by nonresponse bias: UB II receipt since 2009 and foreign citizenship. UB II receipt measured in the administrative data was found to be underrepresented among linked respondents, and both sets of weights succeeded in increasing the representation of UB II recipients and reducing nonresponse bias (see table 2). The survey estimate of UB II experiences a similar increase under both weighting schemes, suggesting that both weighting approaches reduce nonresponse bias for this variable. Foreign citizenship, as measured in the administrative data, was also underrepresented among linked respondents, and both weighting approaches reduced nonresponse bias by shifting the estimate upward, with the combined paradata-administrative data weight yielding a much larger bias reduction than the paradata-only weight. A similar picture emerges in the survey data as only the combined-data weighted estimate of foreign born experiences an upward shift relative to the unweighted estimate; thus, using the administrative data in the weighting process may do a better job of reducing nonresponse bias for this particular survey item relative to the standard (paradata-only) weighting approach. In addition to the point estimates, it is also important to consider the impact of the weighting schemes on the variances of the survey estimates. We examined the standard errors and coefficients of variation (CV) for the survey estimates under the alternative weighting schemes (see Appendix table A.4 in the online supplementary material). In general, only minor CV differences exist between the two weighting approaches. The combined paradata-administrative data weight succeeds in reducing the CV over the paradata-only weight for half of the estimates. Little and Vartivarian (2005) concluded that the variance of an estimate should decrease when the weighting variables are highly correlated with the variable of interest. This appears to be the case for the UB II receipt item, which is highly correlated (0.77) with its administrative counterpart. This item experiences the largest CV reduction after incorporating the administrative variables into the weighting process (from 0.25 to 0.20). Thus, the administrative variables appear to reduce both nonresponse bias and variance for this survey item. The foreign citizenship item, on the other hand, which has a relatively weaker correlation with its administrative counterpart (0.43), is affected by an increased CV (from 0.23 to 0.31) when the administrative variables are used in the weighting procedure. Hence, the speculative reduction in nonresponse bias for this item is not accompanied by a reduction in variability under the enhanced weighting scheme. 5. DISCUSSION The results of this case study can be distilled into five main findings. First, although the linked administrative variables were only weakly associated with the majority of substantive survey variables, there were several pairs of variables between the two data sources which produced moderate-to-high correlations. Second, correlations with the substantive survey variables were generally higher for the linked administrative variables than for the process-oriented paradata variables—a finding which held even after excluding the similarly measured survey-administrative variable pairs from the comparison. However, both paradata and administrative data were poorly associated with survey participation. Third, adding linked administrative variables to the response propensity model did not substantially improve model fit relative to the paradata-only response model, and only few administrative variables were statistically significant predictors of response. Fourth, despite weak associations with the response outcome, incorporating the linked administrative variables into the nonresponse adjustment procedure reduced the average relative nonresponse bias to a greater extent than the paradata-only adjustment procedure. Lastly, utilizing the administrative data did not substantially impact the survey-weighted estimates and their variances compared with the paradata-only weighted estimates; however, there was an indication that the administrative variables reduced nonresponse bias and variance for a particular survey item (UB II benefit receipt) that was highly correlated with the administrative data—a result that is consistent with the empirical findings reported in Little and Vartivarian (2005). It is common practice to use federal administrative records in survey research, but they are significantly underutilized as a source of auxiliary data for addressing nonresponse bias in general population samples, particularly in applications where direct linkages are not possible. In light of this situation, it is interesting to know that indirectly linking federal administrative data to survey samples can yield benefits in terms of assessing and reducing nonresponse bias for the linked cases. The finding that linked administrative data outperforms process-oriented paradata in terms of their associations with the survey variables and in reducing nonresponse bias is a useful step forward in the search for powerful auxiliary information capable of combating the effects of nonresponse. While indirectly linked federal administrative records may offer potential benefits for addressing nonresponse bias, there are a number of technical issues that warrant attention before the procedure can be used in a production environment. For example, inevitably there will be sample cases that cannot be linked to the administrative database, as was observed in the present study and in other studies involving auxiliary data linkages (Raghunathan and Van Howeyk 2008; DiSogra, Dennis, and Fahimi 2010; Pasek et al. 2014; Sinibaldi, Kreuter, and Trappmann 2014; West, Wagner, Hubbard, and Hu 2015). As such, one cannot make definitive conclusions about nonresponse bias in the survey as a whole based only on the subset of cases that can be linked. It is possible to increase the linkage rate by relaxing the linkage criterion, but this strategy introduces a trade-off as it may increase the likelihood of false-positive links and weaken associations with the substantive survey variables (see footnote 4), thus, diminishing the utility of the linked data and possibly rendering the linkage exercise moot. It is also important to note that the PASS-administrative data linkage was conducted “in-house” at the IAB, where both the PASS survey and IAB employment database are housed. This situation simplified the linkage process immensely both from a technical and legal standpoint. Conducting linkages with survey-administrative data sources belonging to different agencies poses additional challenges, including the need to obtain approvals from various stakeholders and reach agreement on data sharing procedures. While not insurmountable, such negotiations are not straightforward and may carry on longer than expected. Large-scale surveys that currently perform linkages with respondent interview data may be at an advantage of already having an established relationship with the administrative data sponsor, potentially simplifying the process of extending the linkages to include the noninterviewed cases for nonresponse analysis. Assuming these issues can be overcome, we recommend that indirectly linked administrative data be considered as a supplement to existing auxiliary data options for addressing nonresponse bias. Despite modest reductions in nonresponse bias and minimal impact on the weighted survey estimates and corresponding variances, our case study points to potentially strong associations between federal administrative data and substantive survey variables that could be leveraged to address more severe cases of nonresponse bias. We envision multiple ways in which these properties could be harnessed in future work, including studying prospective nonresponse bias in longitudinal surveys, monitoring sample representativeness during fieldwork, and informing data collection interventions in a responsive design framework. Supplementary Materials Supplementary materials are available online at academic.oup.com/jssam. Footnotes 1 These subgroups comprise roughly 12.5 percent of the total population aged fifteen to sixty-five based on figures from the 2011 Microcensus data obtained from the Federal Statistical Office. 2 See www.record-linkage.de for more details on the German Record Linkage Center and the Merge ToolBox software. 3 Despite our relatively comprehensive list of non-unique identifiers, false-positive or one-to-many links are still a possibility. This can be due to missing, incorrect, imprecise, or (inconsistently) abbreviated names and addresses in either of the databases. One linkage variable, in particular, that contributed to several false-positive links as reported in the linkage evaluation by Sakshaug, Antoni, and Sauckel (2017) was the dichotomous indicator of birth cohort. This broadly defined indicator of age yielded several false links and one-to-many links with persons within the same household but from a different generation than the sampled person. The authors reported that increasing the strictness of a link (based on the MCI scale) substantially reduced the age discrepancies for the linked cases (p. 69). 4 We also assessed these correlations under a less strict linkage criteria (i.e., using MCI values between one and twelve). In general, we found that gradually lowering the linkage threshold monotonically dampens the correlations between the survey and administrative variables. This result was not unexpected given that the rate of false-positive links was found to be higher for the smaller MCI values as reported in Sakshaug, Antoni, and Sauckel (2017). 5 To avoid the circular use of the same administrative variable used in both the weight generation and survey estimation processes, we remove the target administrative variable from the logistic regression model used to estimate the response propensity scores and resulting weights. This variable removal procedure is performed separately for each target administrative variable used in the survey estimation. For example, the UBII_2009 administrative variable was dropped from the logistic regression model during the process of estimating nonresponse bias for this variable in table 2. REFERENCES The American Association for Public Opinion Research (AAPOR). 2016 , Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys, (9th ed.), AAPOR, available at https://www.aapor.org/AAPOR_Main/media/publications/Standard-Definitions20169theditionfinal.pdf. (Accessed March 24, 2018). Antoni M. , Ganzer A. , vom Berge P. ( 2016 ), “Sample of Integrated Labour Market Biographies (SIAB) 1975-2014,” FDZ-Datenreport, 04/2016 (en), Nuremberg. Antoni M. , Bethmann A. ( 2018 ), “ PASS-ADIAB – Linked Survey and Administrative Data for Research on Unemployment and Poverty, ” Journal of Economics and Statistics [online], DOI: https://doi.org/10.1515/jbnst-2018-0002. Available at https://www.degruyter.com/view/j/jbnst.ahead-of-print/jbnst-2018-0002/jbnst-2018-0002.xml. Baumgarten D. ( 2013 ), “Exporters and the Rise in Wage Inequality: Evidence from German Linked Employer–Employee Data,” Journal of International Economics , 90 , 201 – 217 . Google Scholar CrossRef Search ADS Bee C. A. , Gathright G. M. R. , Meyer B. D. ( 2015 ), “Bias from Unit Non-Response in the Measurement of Income in Household Surveys,” paper presented at the Joint Statistical Meetings of the American Statistical Association, Seattle, WA, available at http://www.sole-jole.org/16068.pdf. (Accessed March 24, 2018). Berg M. , Cramer R. , Dickmann D. , Gilberg R. , Jesske B. , Kleudgen M. , Bethmann A. , Fuchs B. , Trappmann M. , Wurdack A. ( 2012 ), “Codebook and Documentation of the Panel Study ‘Labour Market and Social Security’ (PASS) Wave 5,” FDZ-Datenreport, 06/2012 (en), Institute for Employment Research, Nuremberg. Blom E. , Carlsson F. ( 1999 ), “Registers in Official Statistics: A Swedish Perspective,” Invited paper for the Joint/ECE/Eurostat Work Session on Registers and Administrative Records for Social and Demographic Statistics, Geneva, available at http://www.unece.org/fileadmin/DAM/stats/documents/1999/03/registers/14.e.pdf. (Accessed March 24, 2018). Boockmann B. , Ammermüller A. , Zwick T. , Maier M. ( 2007 ), “Do Hiring Subsidies Reduce Unemployment among the Elderly? Evidence from Two Natural Experiments,” ZEW Discussion Papers No. 07-001. Brick J. M. , Kalton G. ( 1996 ), “Handling Missing Data in Survey Research,” Statistical Methods in Medical Research , 5 , 215 – 238 . Google Scholar CrossRef Search ADS PubMed Burr H. , Rauch A. , Rose U. , Tisch A. , Tophoven S. ( 2015 ), “Employment Status, Working Conditions and Depressive Symptoms among German Employees Born in 1959 and 1965,” International Archives of Occupational and Environmental Health , 88 , 731 – 741 . Google Scholar CrossRef Search ADS PubMed Diez Roux A. V. ( 2001 ), “Investigating Neighborhood and Area Effects on Health,” American Journal of Public Health , 91 , 1783 – 1789 . Google Scholar CrossRef Search ADS PubMed DiSogra C. , Dennis J. M. , Fahimi M. ( 2010 ), “On the Quality of Ancillary Data Available for Address-Based Sampling,” paper presented at the Joint Statistical Meetings of the American Statistical Association, Vancouver, British Columbia, available at http://www.knowledgenetworks.com/ganp/docs/jsm2010/On-the-Quality-of-Ancillary-ABS-2010-JSM-submission.pdf. (Accessed March 24, 2018). Federal Data Protection Act ( 2013 ), “Admissibility of Data Collection, Processing and Use,” Bundesministerium der Justiz und für Verbraucherschutz, available at http://www.gesetze-im-internet.de/englisch_bdsg/index.html. (Accessed March 24, 2018). Freedman V. A. , McGonagle K. , Andreski P. ( 2014 ), “The Panel Study of Income Dynamics Linked Medicare Claims Data,” PSID Technical Report Series 14-01, University of Michigan. Groves R. M. ( 2006 ), “Nonresponse Rates and Nonresponse Bias in Household Surveys,” Public Opinion Quarterly , 70 , 646 – 675 . Google Scholar CrossRef Search ADS Jacobebbinghaus P. , Seth S. ( 2007 ), “The German Integrated Employment Biographies Sample IEBS,” Schmollers Jahrbuch: Journal of Applied Social Science Studies , 127 , 335 – 342 . Knies G. , Burton J. ( 2014 ), “Analysis of Four Studies in a Comparative Framework Reveals: Health Linkage Consent Rates on British Cohort Studies Higher than on UK Household Panel Surveys,” BMC Medical Research Methodology , 14 , 125 . Google Scholar CrossRef Search ADS PubMed Korbmacher J. , Czaplicki C. ( 2013 ), “Linking SHARE Survey Data with Administrative Records: First Experiences from SHARE-Germany,” in SHARE Wave 4: Innovations & Methodology , eds. Malter F. , Börsch-Supan A. , p. 4753, Munich : MEA, Max Planck Institute for Social Law and Social Policy . Kreuter F. ( 2013 ), Improving Surveys with Paradata: Analytic Uses of Process Information , Hoboken, NJ : Wiley . Google Scholar CrossRef Search ADS Kreuter F. , Kohler U. ( 2009 ), “Analyzing Contact Sequences in Call Record Data: Potential and Limitations of Sequence Indicators for Nonresponse Adjustments in the European Social Survey,” Journal of Official Statistics , 25 , 203 – 226 . Kreuter F. , Müller G. , Trappmann M. ( 2010 ), “Nonresponse and Measurement Error in Employment Research: Making Use of Administrative Data,” Public Opinion Quarterly , 74 , 880 – 906 . Google Scholar CrossRef Search ADS Kreuter F. , Olson K. , Wagner J. , Yan T. , Ezzati‐Rice T. M. , Casas‐Cordero C. , Lemay M. , Peytchev A. , Groves R. M. , Raghunathan T. E. ( 2010 ), “Using Proxy Measures and Other Correlates of Survey Outcomes to Adjust for Nonresponse: Examples from Multiple Surveys,” Journal of the Royal Statistical Society: Series A (Statistics in Society) , 173 , 389 – 407 . Google Scholar CrossRef Search ADS Lin I. , Schaeffer N. C. ( 1995 ), “Using Survey Participants to Estimate the Impact of Nonparticipation,” Public Opinion Quarterly , 59 , 236 – 258 . Google Scholar CrossRef Search ADS Little R. J. A. , Vartivarian S. ( 2005 ), “Does Weighting for Nonresponse Increase the Variance of Survey Means?” Survey Methodology , 31 , 161 – 168 . Möller J. , Walwei U. ( 2009 ), “Editorial,” in Aktivierung, Erwerbstätigkeit und Teilhabe. Vier Jahre Grundsicherung für Arbeitsuchende , eds. Koch S. , Kupka P. , Steinke J. , p. 1112, Bielefeld : IAB-Bibliothek 315. Mostafa T. ( 2016 ), “Variation within Households in Consent to Link Survey Data to Administrative Records: Evidence from the UK Millennium Cohort Study,” International Journal of Social Research Methodology , 19 , 355 – 375 . Google Scholar CrossRef Search ADS Olson J. A. ( 1999 ), “Linkages with Data from Social Security Administrative Records in the Health and Retirement Study,” Social Security Bulletin , 62 , 73 – 85 . Olson K. ( 2013 ), “Paradata for Nonresponse Adjustment,” The Annals of the American Academy of Political and Social Science , 645 , 142 – 170 . Google Scholar CrossRef Search ADS Pasek J. , Jang S. M. , Cobb C. L. III , Dennis J. M. , DiSogra C. ( 2014 ), “Can Marketing Data Aid Survey Research? Examining Accuracy and Completeness in Consumer-File Data,” Public Opinion Quarterly , 78 , 889 – 916 . Google Scholar CrossRef Search ADS Peytchev A. , Olson K. ( 2007 ), “Using Interviewer Observations to Improve Nonresponse Adjustments: NES 2004,” paper presented at the Joint Statistical Meetings of the American Statistical Association, Salt Lake City, UT, available at https://digitalcommons.unl.edu/sociologyfacpub/145/. (Accessed March 24, 2018). Raghunathan T. E. , Van Howeyk J. ( 2008 ), Disclosure Risk Assessment for Survey Microdata , unpublished manuscript, University of Michigan . Sakshaug J. W. , Antoni M. ( 2017 ), “Errors in Linking Survey and Administrative Data,” in Total Survey Error in Practice , eds. Biemer P. , Eckman S. , Edwards B. , de Leeuw E. , Kreuter F. , Lyberg L. , Tucker C. , West B. , pp. 557 – 573, Hoboken, NJ : John Wiley and Sons . Google Scholar CrossRef Search ADS Sakshaug J. W. , Antoni M. , Sauckel R. ( 2017 ), “The Quality and Selectivity of Linking Federal Administrative Records to Respondents and Nonrespondents in a General Population Survey in Germany,” Survey Research Methods , 1 , 63 – 80 . Sakshaug J. W. , Kreuter F. ( 2011 ), “Using Paradata and Other Auxiliary Data to Examine Mode Switch Nonresponse in a ‘Recruit-and-Switch’ Telephone Survey,” Journal of Official Statistics , 27 , 339 – 357 . Schouten B. , Shlomo N. , Skinner C. ( 2011 ), “Indicators for Monitoring and Improving Representativeness of Response,” Journal of Official Statistics , 27 , 1 – 24 . Sinibaldi J. , Trappmann M. , Kreuter F. ( 2014 ), “Which Is the Better Investment for Nonresponse Adjustment: Purchasing Commercial Auxiliary Data or Collecting Interviewer Observations?” Public Opinion Quarterly , 78 , 440 – 473 . Google Scholar CrossRef Search ADS Smith T. W. ( 2011 ), “The Report of the International Workshop on Using Multi-Level Data from Sample Frames, Auxiliary Databases, Paradata and Related Sources to Detect and Adjust for Nonresponse Bias in Surveys,” International Journal of Public Opinion Research , 23 , 389 – 402 . Google Scholar CrossRef Search ADS Smith T. W. , Kim J. ( 2013 ), “An Assessment of the Multi-Level Integrated Database Approach,” The Annals of the American Academy of Political and Social Science , 645 , 185 – 221 . Google Scholar CrossRef Search ADS Trappmann M. , Beste J. , Bethmann A. , Müller G. ( 2013 ), “The PASS Survey after Six Waves,” Journal of Labour Market Research , 46 , 275 – 281 . Google Scholar CrossRef Search ADS United Nations Economic Commission for Europe (UNECE) , 2007 . “Register-Based Statistics in the Nordic Countries: Review of Best Practices with Focus on Population and Social Statistics,” Technical Report E.07.II.E.11, United Nations, New York/Geneva, available at http://www.unece.org/fileadmin/DAM/stats/publications/Register_based_statistics_in_Nordic_countries.pdf. (Accessed March 24, 2018). Wagner J. ( 2012 ), “A Comparison of Alternative Indicators for the Risk of Nonresponse Bias,” Public Opinion Quarterly , 76 , 555 – 575 . Google Scholar CrossRef Search ADS Wallgren A. , Wallgren B. ( 2007 ), Register-Based Statistics: Administrative Data for Statistical Purposes , West Sussex, England : Wiley . West B. T. , Kreuter F. , Trappmann M. ( 2014 ), “Is the Collection of Interviewer Observations Worthwhile in an Economic Panel Survey? New Evidence from the German Labor Market and Social Security (PASS) Study,” Journal of Survey Statistics and Methodology , 2 , 159 – 181 . Google Scholar CrossRef Search ADS West B. T. , Wagner J. , Hubbard F. , Gu H. ( 2015 ), “The Utility of Alternative Commercial Data Sources for Survey Operations and Estimation: Evidence from the National Survey of Family Growth,” Journal of Survey Statistics and Methodology , 3 , 240 – 264 . Google Scholar CrossRef Search ADS © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Survey Statistics and Methodology Oxford University Press

Evaluating the Utility of Indirectly Linked Federal Administrative Records for Nonresponse Bias Adjustment

Loading next page...
 
/lp/ou_press/evaluating-the-utility-of-indirectly-linked-federal-administrative-2fhCfjJwq2
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research.
ISSN
2325-0984
eISSN
2325-0992
D.O.I.
10.1093/jssam/smy009
Publisher site
See Article on Publisher Site

Abstract

Abstract Survey researchers are actively seeking powerful auxiliary data sources capable of correcting for possible nonresponse bias in survey estimates of the general population. While several auxiliary data options exist, concerns about their usefulness for addressing nonresponse bias remain. One underutilized—but potentially rich—source of auxiliary data for nonresponse bias adjustment is federal administrative records. While federal records are routinely used to study nonresponse in countries where it is possible to directly link them (via a unique identifier) to population-based samples, such records are not widely used for this purpose in countries which lack a unique identifier to facilitate direct linkage. In this article, we examine the utility of indirectly linked administrative data from a federal employment database for nonresponse bias adjustment in a general population survey in Germany. In short, we find that the linked administrative variables have stronger correlations with the substantive survey variables than do standard paradata variables and that incorporating linked administrative data in nonresponse weighting adjustments reduces relative nonresponse bias to a greater extent than paradata-only weighting adjustments. However, for the majority of weighted survey estimates, including the administrative variables in the weighting adjustment procedure has minimal impact on the point estimates and their variances. We conclude with a general discussion of these findings and comment on the logistical issues associated with this type of linkage relevant to survey practice. 1. INTRODUCTION There is increasing interest in identifying auxiliary data sources capable of measuring and mitigating the effects of nonresponse bias in surveys (Smith 2011). It is well-established that auxiliary data collected on both respondents and nonrespondents permit more detailed investigations into the impact of nonresponse bias than the response rate alone (e.g., Schouten, Shlomo, and Skinner 2011; Wagner 2012). Public calls for powerful auxiliary information capable of adjusting for nonresponse bias have been made by prominent survey methodologists, including Groves (2006) who asserts that “effective surveys require the designer to anticipate nonresponse and actively seek auxiliary data that can be used to reduce the effect of the covariance of response propensities and the survey variables” (p. 670) and Smith (2011) who, in summarizing a workshop on using auxiliary data for nonresponse adjustment, remarks that “Much research is needed on the augmenting of sample-frame data with information from other databases and sources” (p. 392). In this article, we consider the use of a relatively underused source of auxiliary data for nonresponse bias adjustment: federal administrative records. In particular, we consider the situation where federal administrative records cannot be directly linked to the survey sample via a unique identifier and, instead, must be indirectly linked using error-prone and nonunique identifiers. The utility of this approach for addressing nonresponse bias is assessed using a novel linked-data source in Germany described in Sakshaug, Antoni, and Sauckel (2017). Using these data, we evaluate associations between the linked administrative variables and survey variables and determine whether incorporating the linked administrative variables in weighting adjustments improves nonresponse bias reduction over paradata alone. 2. BACKGROUND 2.1 Auxiliary Data Sources The range of auxiliary data sources available for general population surveys is sparse (Olson 2013). Process-oriented paradata recorded at each call attempt are among the most commonly used auxiliary data sources (Kreuter 2013), but their ability to detect and adjust for nonresponse bias in specific survey estimates is limited. Little and Vartivarian (2005) show that the most effective auxiliary variables for nonresponse bias adjustment are those that are strongly correlated with both the survey variables and the response outcome. While process paradata tends to be moderately correlated with the response outcome, their correlations with the survey variables are weaker (Lin and Schaeffer 1995; Peytchev and Olson 2007; Kreuter and Kohler 2009; Kreuter, Olson, Wagner, Yan, Ezzati‐Rice et al. 2010; Sakshaug and Kreuter 2011). As Smith (2011) alternatively puts it, “Process paradata are much more likely to be related to general causes of nonresponse and much less likely to be related to specific substantive variables. Their value is more in predicting nonresponse in general” (p. 395). Other forms of paradata, such as interviewer observations about the sampled neighborhood (e.g., cleanliness, safety), household (e.g., appearance, size) and household members (e.g., demographic characteristics, proxy variables) are more likely to be associated with substantive survey variables, but in practice these associations tend to be modest (Diez Roux 2001; Peytchev and Olson 2007; West, Kreuter, and Trappmann 2014). Alternatively, there is considerable interest in linking sample auxiliary information from commercial databases to sampled households (Smith 2011). Studies report relatively high rates of linkage (generally greater than seventy percent) in general population samples; however, agreement rates and correlations between the commercial and survey variables vary widely (Raghunathan and Van Howyek 2008; DiSogra, Dennis, and Fahimi 2010; Pasek, Jang, Cobb, Dennis, and DiSogra 2014; Sinibaldi, Kreuter, and Trappmann 2014; West, Wagner, Hubbard, and Hu 2015). Further, commercial variables tend to yield only modest improvement in response propensity models with little impact on the resulting weighted estimates (West, Wagner, Hubbard, and Hu 2015). These studies also caution on the quality of commercial data, since they can have high rates of item missing data, outdated information about household occupants, and contents which are not standardized across local government boundaries (Smith and Kim 2013). 2.2 Linked Administrative Data as an Auxiliary Data Source In the present study, we examine a relatively underutilized auxiliary data source that overcomes some of the above limitations: federal administrative records. Such records possess unique qualities that make them a particularly appealing source of auxiliary data for population samples. While not designed for research purposes, federal administrative records often contain detailed and relatively up-to-date information on population members, including participation in government-sponsored programs (e.g., welfare, healthcare) and details regarding financial matters (e.g., taxable earnings, healthcare expenditures). Typically, federal records are longitudinal in nature and document important life course transitions that surveys attempt to measure (e.g., labor force participation, unemployment duration, benefit receipt). For these reasons, federal administrative records are commonly linked to survey respondents to supplement the collected interview data (e.g., Olson 1999; Antoni and Bethmann 2018; Korbmacher and Czaplicki 2013; Freedman, McGonagle, and Andreski 2014; Knies and Burton 2014; Mostafa 2016). Beyond using federal administrative records as a supplementary data source for analyzing survey respondents, their contents related to common survey topics make them a potentially promising source of auxiliary data for addressing nonresponse bias. Several countries (e.g., the Netherlands, Sweden, Finland) already make use of administrative data for this purpose. This is facilitated through population registers which are used as sampling frames for household surveys (Blom and Carlsson 1999; UNECE 2007; Wallgren and Wallgren 2007). These population registers contain unique personal identifiers through which various substantive administrative databases can be directly linked to the drawn sample. However, this situation is atypical in countries where population registers either do not exist or lack a unique identifier that facilitates direct linkage. Linking federal administrative data to general population samples must therefore be carried out using indirect linkage procedures (e.g., probabilistic linkage) that rely on nonunique and error-prone identifiers (e.g., first and last name, address; for an overview of record linkage error sources, the reader is referred to Sakshaug and Antoni 2017). Published case studies of indirect linkages between federal administrative records and general population samples (respondents and nonrespondents) are rare, and it is unclear whether such linkages are useful for addressing nonresponse bias (Bee, Gathright, and Meyer 2015; Sakshaug, Antoni, and Sauckel 2017). Two relevant outcomes associated with this type of linkage are the linkage rate—the proportion of sample units that can be successfully linked to the target administrative database—and how representative the linked cases are of the entire data file. In nearly all indirect linkage applications there is a failure to link a subset of units, either because the linkage criterion is not met or because the target database does not contain a record for every sample unit. In both cases, the failure to link all units introduces the potential for linkage bias. Bee, Gathright, and Meyer (2015) investigated these issues in a study conducted at the US Census Bureau where sampled addresses from the 2011 Current Population Survey (CPS) Annual Social and Economic Supplement were indirectly linked to 2010 federal tax records compiled from the Internal Revenue Service (IRS) 1040 form. Household linkage rates of seventy-nine and seventy-six percent were reported for respondents and nonrespondents, respectively. Some linkage biases were reported: for example, low-income households were linked at a lower rate than higher-income households, likely due to the fact that lower-income households are not required to file tax returns and are, thus, underrepresented in the record base. In Germany, Sakshaug, Antoni, and Sauckel (2017) report the results of an indirect linkage performed on a general population sample of individuals from the German Panel Study “Labour Market and Social Security (PASS)” to a federal employment database maintained by the Institute for Employment Research (IAB) of the Federal Employment Agency (BA). The authors report a linkage rate of about sixty percent under a strict linkage criterion and eighty percent under a more relaxed criterion with similar linkage rates between respondents and nonrespondents. Older age groups, self-employed, and civil servants, who are known to be underrepresented in the employment database, were linked at lower rates compared to their counterparts. 2.3 Linked Administrative Data for Addressing Nonresponse Bias The previously mentioned case studies highlight the difficulty in achieving one hundred percent linkage rates for general population samples. However, for the majority of sample cases that can be linked, a key question is whether the linked administrative data are useful for studying nonresponse bias. Bee, Gathright, and Meyer (2015) addressed this question by examining whether CPS respondents differed from nonrespondents with respect to key variables derived from the linked tax records. Most notably, they found very little evidence of nonresponse bias in the CPS with respect to adjusted gross income as reported in the filed 1040 form. In contrast, the number of dependents, receipt of income from certain sources, proportion of units with a married filer, and some demographic characteristics were reported as evidence of nonresponse bias. A further issue that is underexplored is whether indirectly linked administrative data are useful for nonresponse bias adjustment. The above literature review suggests that these data likely possess key properties amenable to bias correction, but whether they substantially 1) improve model fit and 2) reduce nonresponse bias relative to paradata alone is unclear. To investigate this issue, we make use of the linked PASS survey and federal employment database of the IAB, reported in Sakshaug, Antoni, and Sauckel (2017), to address the following research questions: To what extent are linked administrative data from a federal employment database correlated with substantive survey variables and the response outcome? How do these correlations compare to those involving standard paradata? Does the inclusion of linked administrative variables in nonresponse adjustment models substantially improve model fit over paradata-only adjustment models? Does the inclusion of linked administrative variables in nonresponse-adjustment weights reduce nonresponse bias to a greater extent than paradata-only adjustment weights? To what extent do weighted survey estimates differ depending on whether linked administrative data are included in the nonresponse weighting procedure? 3. DATA AND METHODS 3.1 Survey Data Source The PASS is a longitudinal study of households conducted annually by the IAB in Germany since 2006 (Trappmann, Beste, Bethmann, and Müller 2013). The study was designed to measure the economic and social circumstances of individuals and their households in the aftermath of the reorganization of the welfare and unemployment benefits system (the “Hartz-Reforms”), which included the introduction of a new means-tested benefit scheme coined Unemployment Benefit II (UB II; Möller and Walwei 2009). The PASS study is based on separate, independent samples of UB II benefit recipients and residents of the general population. The UB II sample is drawn from an administrative list of all UB II recipients, and the general population sample is drawn from population lists amassed from municipality registration offices; in Germany, resident registration is mandatory. Both samples are drawn using a stratified cluster sampling design with (approximately) equal probabilities of selection. Each sample is composed of named individuals and all members of the sampled person’s household starting from age fifteen are interviewed. A household-level interview is completed by the household member most knowledgeable about the household situation. Data collection is carried out using a sequential mixed-mode design involving computer-assisted personal and telephone interviewing. Full details about the PASS methodology are available in Trappmann, Beste, Bethmann, and Müller (2013). Building on Sakshaug, Antoni, and Sauckel (2017), we utilize the 2011 PASS wave five general population refreshment sample. The total drawn sample consists of 6,237 individuals, of which 1,540 completed the PASS interview for a response rate of 25.1 percent (Response Rate 1; AAPOR 2016). In evaluating the utility of the administrative data for nonresponse adjustment, we make use of eight household-level and eight person-level PASS survey variables. The household-level variables include household size, presence of child under fifteen years of age, household ownership, material deprivation item index and material deprivation activity index (Berg et al. 2012), net income (in Euros) in past month, household savings, and received UB II at least once since 2009. The person-level variables include age (in years), sex, foreign citizenship, currently employed, total number of employer changes in lifetime, if ever registered as regular unemployed, presence of officially recognized disability, and if currently receives statutory pension payments. These variables and their coding schemes are described in Appendix table A.1 of the online supplementary material. The variables were selected based on their popularity among data users and discussions with the PASS team. Additionally, we make use of the following paradata variables collected during the PASS recruitment: total number of contact attempts, case initially refused interview, refusal conversion was applied, case involved an interviewer switch, and case involved a mode switch. 3.2 Administrative Data Source The administrative data source linked to the PASS wave five general population refreshment sample is the employment database of the IAB. The IAB database is constructed from administrative processes rendered by the BA, including social security notifications submitted by employers regarding their employees and registered activities concerning unemployment, job search, and participation in active labor market programs (Jacobebbinghaus and Seth 2007; Antoni, Ganzer, and vom Berge 2016). The culmination of these processes results in a database that covers the majority of the working population in Germany. Underrepresented groups include civil servants, the self-employed, and homemakers, who are exempt from making social security contributions.1 In the following analyses, we consider eight administrative variables: received UB II at least once since 2009, total number of employment spells in lifetime, currently employed, ever received regular unemployment benefit, age, sex, average daily wage, and foreign citizenship. These variables are commonly used in economic studies utilizing the IAB database (Boockmann, Ammermüller, Zwick, and Maier 2007; Kreuter, Müller, and Trappmann 2010; Baumgarten 2013; Burr, Rauch, Rose, Tisch, and Tophoven 2015). We note that each of these administrative variables has a similar counterpart among the selected PASS survey variables. For instance, age, sex, and UB II receipt are closely measured in both data sources. We expect moderate to high correlations between some of these counterparts, but others (e.g., employment variables) are likely to have lower correspondence due to construct and data generation differences. For example, only employment spells that are subject to social security contributions are included in the administrative counts, whereas the self-reported counts can include a broader range of employment spells. We believe this situation is representative of survey practice as linked administrative variables are likely to have similarities with the collected survey variables, but are not designed to be perfect replacements for them. 3.3 Linkage Procedures and Evaluation Before describing the linkage procedures, we reiterate that the purpose of linking the IAB employment database to the PASS general population sample is to maximize the amount of auxiliary information available for both respondents and nonrespondents. In Germany, record linkages are subject to strict data protection regulations and often require authorization from survey participants (Federal Data Protection Act 2013). The PASS survey routinely links interview data—conditional on respondent consent—to the IAB employment database. However, our use of linkage differs in the sense that only the sample paradata are linked to the IAB database. That is, only process-oriented survey variables (e.g., sample disposition codes, number of call attempts), and no substantive survey variables, are linked to the administrative data. The IAB legal team confirmed that linking process-oriented variables does not require consent. Full details of the linkage procedure can be found in Sakshaug, Antoni, and Sauckel (2017). Here we provide a basic summary of the process. An indirect linkage procedure was performed on the 6,237 persons drawn from municipality records for the PASS wave five refreshment sample. Other members of the sampled person’s household were excluded from the linkage. The linkage procedure was based on eight non-unique linkage variables available in the sampling frame and IAB administrative data: first name, last name, zip code, city name, street name, house number, sex, and binary birth cohort (born before 1945 or after). Numerous preprocessing steps were implemented to standardize these variables. The actual linkage was carried out using three subsequent procedures: 1) deterministic linkage; 2) distance-based linkage; and 3) probabilistic linkage. The distance-based and probabilistic linkage procedures were performed using the Merge ToolBox (MTB) software package developed by the German Record Linkage Center.2 The deterministic linkage was performed using Stata. A match certainty index (MCI) was constructed to classify the strictness of a link. MCI values ranged from zero to seventeen with zero denoting a non-link and values one to seventeen denoting links obtained at thresholds of increasing strictness (i.e., higher values correspond to a higher certainty that a true link has been identified).3 We classify a link as having an MCI value between thirteen and seventeen. This is a slightly stricter range than the one used by Sakshaug, Antoni, and Sauckel (2017), who classified a restrictive link within the MCI range six to seventeen. The revised MCI range was adopted after inspecting the linkage distribution (provided in Appendix figure A.1 of the online supplementary material) and identifying a relatively sparse number of links for each step of the range seven to twelve. The revised MCI range, which in our view represents a more natural cut-off based on the empirical distribution, corresponds to an overall linkage rate of 58.6 percent or a total of 3,653 linked cases out of 6,237. A total of 875 respondents and 2,778 nonrespondents were linked with linkage rates of 56.8 and 60.3 percent, respectively. These 3,653 linked cases serve as the basis for answering the four research questions. 3.4 Statistical Analysis 3.4.1 RQ1: correlation between linked administrative variables and survey variables. RQ1 is addressed by calculating absolute Pearson correlation coefficients for the eight administrative variables paired with the sixteen survey variables for the 625 (out of 875) respondents who consented to linkage of their interview data with the IAB administrative data. In addition, correlations between the administrative variables and the response outcome are presented based on all linked cases irrespective of linkage consent (n = 3,653). For comparison, the same correlations are presented for the paradata variables to assess the strength of their linear relationship with the substantive survey variables and the response outcome. 3.4.2 RQ2: incorporating linked administrative data in survey response modes. RQ2 is addressed by fitting two logistic regression models with survey participation (1 = response; 0 = nonresponse) as the dependent variable. The first model conditions on the aforementioned paradata variables only and the second model conditions on the paradata and linked administrative variables. The impact of including the administrative variables in the response model is evaluated by assessing the statistical significance of the administrative data coefficients and looking for substantial improvement in model fit statistics, including McFadden’s Pseudo R2 and area under the ROC curve (AUC) relative to the paradata-only model. 3.4.3 RQ3: utility of administrative data for nonresponse bias reduction. RQ3 is addressed by assessing and comparing nonresponse bias for weighted estimates of the eight administrative variables. The weights are constructed using estimated response propensity scores (e.g., Brick and Kalton 1996) generated from the two regression models fitted from the previous analysis. For each model, the response propensity scores are generated, sorted from lowest to highest and divided into ten approximately equal-sized groups. The adjustment weight is calculated as the inverse of the average propensity score within each decile group. The same procedure is carried out using both regression models, yielding two sets of weights that vary in their level of covariate information: 1) paradata only; and 2) paradata and administrative data. Nonresponse bias for the estimated mean (or percentage) of each administrative variable is calculated by taking the difference between the (weighted or unweighted) estimate based on the linked respondents ( Y-r) and the estimate based on the linked respondents and nonrespondents ( Y-n): Nonresponsebias=Y-r-Y-n. A measure of absolute relative nonresponse bias (ARNB) is also reported which shows the amount of nonresponse bias in the linked respondent estimate relative to the linked sample (respondents and nonrespondents) estimate. Specifically, ARNB is calculated as: AbsoluterelativenonresponsebiasARNB=Y-r-Y-nY-n. 3.4.4 RQ4: impact of administrative data on weighted survey estimates. To answer RQ4, we apply the two sets of adjustment weights from the previous analysis to the sixteen survey variables and calculate the difference between the weighted estimates ( Y-r,wtd) and the unweighted ( Y-r,unwtd) estimates to compare the impact of both sets of weights on the survey estimates: Differencebetweenweightedandunweightedestimates=Y-r,wtd-Y-r,unwtd. Standard errors and coefficients of variation for each estimate are also reported to assess the impact of the weights on the variability of the estimates. A measure of absolute relative difference (ARD) is presented to quantify the magnitude of the difference between the weighted and unweighted point estimates in relation to the unweighted estimate: AbsoluterelativedifferenceARD=Y-r,wtd-Y-r,unwtdY-r,unwtd. Both ARNB and ARD measures are reported in percentage terms by multiplying the above formulae by 100. All analyses are performed using the survey functions in Stata 14.1. 4. RESULTS 4.1 RQ1: Correlation between Linked Administrative Variables and Survey Variables In the first analysis, we compare the linked administrative variables and paradata variables with respect to their strength of association with the survey variables. Absolute Pearson correlation coefficients for the eight administrative variables paired with the sixteen survey variables are provided in figure 1 (see Appendix table A.2 in the online supplementary material for a tabular version of the correlations). The absolute correlations span the full range from zero to almost one. The majority of the correlations are small: ninety-six (out of 128 possible correlations or seventy-five percent) lie in the range 0.00 to 0.20, twenty-two (or seventeen percent) lie between 0.20 and 0.40, and the remaining ten correlations (eight percent) exceed 0.40.4 As expected, the largest correlations are observed for administrative variables that have similar counterparts in the survey data (e.g., age [0.97], gender [0.99], UB II receipt since 2009 [0.77], ever registered as regular unemployed/received regular unemployment benefit [0.71]). Moderate to high correlations exist for several nonsimilar pairs of survey-administrative variables (e.g., employment status-average daily wage [0.64], household savings-UB II receipt [0.43], material deprivation activity index-UB II receipt [0.41]). Overall, the administrative variables that contribute to the greatest number of correlations with the survey variables of at least 0.20 are UB II receipt (eight), average daily wage (seven), ever received regular unemployment benefit (six), and age (five). Figure 1. View largeDownload slide Absolute Pearson Correlations of the Administrative Variables and Paradata Variables Paired with the Survey Variables and Response Outcome. Figure 1. View largeDownload slide Absolute Pearson Correlations of the Administrative Variables and Paradata Variables Paired with the Survey Variables and Response Outcome. For comparison, figure 1 also shows the correlations between the survey and paradata variables, which appear to be weaker than the survey-administrative variable pairs; the maximum correlation between the survey and paradata variables is 0.22 (a tabular version of all survey-paradata correlations is provided in Appendix table A.3 of the online supplementary material). This pattern persists even when the correlations between the similar survey-administrative variable pairs are ignored. Figure 1 also indicates that both paradata and administrative data are only weakly correlated with the response outcome; the maximum absolute correlations with the response outcome are 0.08 and 0.12 for the administrative data and paradata, respectively. From these analyses, we can confirm that the administrative data generally yields higher correlations with the substantive survey variables compared with the paradata, and that both paradata and administrative data are only weakly associated with the response outcome. 4.2 RQ2: Incorporating Linked Administrative Data in Survey Response Models Next, we turn to the question of whether adding linked administrative variables as covariates in nonresponse adjustment models improves model fit over models that make use of paradata only. Table 1 presents two logistic regression models of response. Model one is the reduced model with paradata covariates only, and model two is the full model, which includes both paradata and administrative covariates. Two of the five paradata variables are statistically significant in each model: the number of contact attempts is positively associated, and switching interviewers is negatively associated with response. The magnitude of the association (and standard errors) with the paradata variables does not change with the addition of the administrative variables, indicating a different type of relationship between these two sets of variables. In model two, only two of the eight administrative variables yield a statistically significant association with response: age and foreign citizenship. Older age groups (44–53 and 54 or older) and the missing age group are more likely to respond relative to the youngest age group (30 or younger). There is no statistically significant difference in response between foreign and German citizens, but individuals for whom foreign status is missing are significantly less likely to respond compared with foreign citizens. In terms of model fit, there is only minor improvement when the administrative covariates are added to the model: the Pseudo R2 increases from 0.27 to 0.29, and the area under the ROC curve (AUC) increases from 0.83 to 0.85. In summary, there is only slight evidence that adding administrative variables to the response model improves model fit. Table 1. Logistic Regression Models of Survey Response on Paradata and Administrative Variables (n = 3, 653) Model 1: Paradata only Model 2: Paradata + administrative data Coef. Std. Error Coef. Std. Error Paradata variables Case initially refused interview −0.15 0.22 −0.18 0.23 Refusal conversion was applied −0.42 0.35 −0.47 0.36 Case involved an interviewer switch −1.85*** 0.33 −1.93*** 0.33 Total number of contact attempts   1–2 REF REF REF REF   3–5 3.58*** 0.27 3.60*** 0.27   6–10 4.87*** 0.29 4.91*** 0.28   11 or more 5.01*** 0.34 5.11*** 0.33 Case involved a mode switch 0.20 0.34 0.18 0.34 Administrative variables Received UB II at least once since 2009 − − −0.23 0.15 Total number of employment spells in lifetime  0–2 − − REF REF  3–4 − − 0.07 0.17  5–7 − − 0.07 0.20  8 or more − − 0.16 0.21 Currently employed 0.29 1.12 Avg. daily wage (in Euros)  0 − − REF REF  1–32 − − −0.31 1.14  33–67 − − −0.38 1.16  68–100 − − −0.30 1.14  101 or higher − − −0.41 1.16 Ever received regular unemployment benefit − − −0.18 0.13 Age (in years)  30 or younger − − REF REF  31–43 − − 0.06 0.16  44–53 − − 0.36* 0.16  54 or older − − 0.76*** 0.20  Missing − − 9.04*** 1.51 Male −0.01 0.11 Foreign citizenship  Yes − − REF REF  No − − 0.48 0.31  Missing − − −8.12*** 1.50 Intercept −4.03*** 0.21 −4.66*** 0.46 Model fit statistics AUC 0.83 0.85 Pseudo R2 0.27 0.29 Model 1: Paradata only Model 2: Paradata + administrative data Coef. Std. Error Coef. Std. Error Paradata variables Case initially refused interview −0.15 0.22 −0.18 0.23 Refusal conversion was applied −0.42 0.35 −0.47 0.36 Case involved an interviewer switch −1.85*** 0.33 −1.93*** 0.33 Total number of contact attempts   1–2 REF REF REF REF   3–5 3.58*** 0.27 3.60*** 0.27   6–10 4.87*** 0.29 4.91*** 0.28   11 or more 5.01*** 0.34 5.11*** 0.33 Case involved a mode switch 0.20 0.34 0.18 0.34 Administrative variables Received UB II at least once since 2009 − − −0.23 0.15 Total number of employment spells in lifetime  0–2 − − REF REF  3–4 − − 0.07 0.17  5–7 − − 0.07 0.20  8 or more − − 0.16 0.21 Currently employed 0.29 1.12 Avg. daily wage (in Euros)  0 − − REF REF  1–32 − − −0.31 1.14  33–67 − − −0.38 1.16  68–100 − − −0.30 1.14  101 or higher − − −0.41 1.16 Ever received regular unemployment benefit − − −0.18 0.13 Age (in years)  30 or younger − − REF REF  31–43 − − 0.06 0.16  44–53 − − 0.36* 0.16  54 or older − − 0.76*** 0.20  Missing − − 9.04*** 1.51 Male −0.01 0.11 Foreign citizenship  Yes − − REF REF  No − − 0.48 0.31  Missing − − −8.12*** 1.50 Intercept −4.03*** 0.21 −4.66*** 0.46 Model fit statistics AUC 0.83 0.85 Pseudo R2 0.27 0.29 † p < 0.10; *p < 0.05; **p < 0.01; ***p < 0.001. Table 1. Logistic Regression Models of Survey Response on Paradata and Administrative Variables (n = 3, 653) Model 1: Paradata only Model 2: Paradata + administrative data Coef. Std. Error Coef. Std. Error Paradata variables Case initially refused interview −0.15 0.22 −0.18 0.23 Refusal conversion was applied −0.42 0.35 −0.47 0.36 Case involved an interviewer switch −1.85*** 0.33 −1.93*** 0.33 Total number of contact attempts   1–2 REF REF REF REF   3–5 3.58*** 0.27 3.60*** 0.27   6–10 4.87*** 0.29 4.91*** 0.28   11 or more 5.01*** 0.34 5.11*** 0.33 Case involved a mode switch 0.20 0.34 0.18 0.34 Administrative variables Received UB II at least once since 2009 − − −0.23 0.15 Total number of employment spells in lifetime  0–2 − − REF REF  3–4 − − 0.07 0.17  5–7 − − 0.07 0.20  8 or more − − 0.16 0.21 Currently employed 0.29 1.12 Avg. daily wage (in Euros)  0 − − REF REF  1–32 − − −0.31 1.14  33–67 − − −0.38 1.16  68–100 − − −0.30 1.14  101 or higher − − −0.41 1.16 Ever received regular unemployment benefit − − −0.18 0.13 Age (in years)  30 or younger − − REF REF  31–43 − − 0.06 0.16  44–53 − − 0.36* 0.16  54 or older − − 0.76*** 0.20  Missing − − 9.04*** 1.51 Male −0.01 0.11 Foreign citizenship  Yes − − REF REF  No − − 0.48 0.31  Missing − − −8.12*** 1.50 Intercept −4.03*** 0.21 −4.66*** 0.46 Model fit statistics AUC 0.83 0.85 Pseudo R2 0.27 0.29 Model 1: Paradata only Model 2: Paradata + administrative data Coef. Std. Error Coef. Std. Error Paradata variables Case initially refused interview −0.15 0.22 −0.18 0.23 Refusal conversion was applied −0.42 0.35 −0.47 0.36 Case involved an interviewer switch −1.85*** 0.33 −1.93*** 0.33 Total number of contact attempts   1–2 REF REF REF REF   3–5 3.58*** 0.27 3.60*** 0.27   6–10 4.87*** 0.29 4.91*** 0.28   11 or more 5.01*** 0.34 5.11*** 0.33 Case involved a mode switch 0.20 0.34 0.18 0.34 Administrative variables Received UB II at least once since 2009 − − −0.23 0.15 Total number of employment spells in lifetime  0–2 − − REF REF  3–4 − − 0.07 0.17  5–7 − − 0.07 0.20  8 or more − − 0.16 0.21 Currently employed 0.29 1.12 Avg. daily wage (in Euros)  0 − − REF REF  1–32 − − −0.31 1.14  33–67 − − −0.38 1.16  68–100 − − −0.30 1.14  101 or higher − − −0.41 1.16 Ever received regular unemployment benefit − − −0.18 0.13 Age (in years)  30 or younger − − REF REF  31–43 − − 0.06 0.16  44–53 − − 0.36* 0.16  54 or older − − 0.76*** 0.20  Missing − − 9.04*** 1.51 Male −0.01 0.11 Foreign citizenship  Yes − − REF REF  No − − 0.48 0.31  Missing − − −8.12*** 1.50 Intercept −4.03*** 0.21 −4.66*** 0.46 Model fit statistics AUC 0.83 0.85 Pseudo R2 0.27 0.29 † p < 0.10; *p < 0.05; **p < 0.01; ***p < 0.001. 4.3 RQ3: Utility of Administrative Data for Nonresponse Bias Reduction Here, we turn to the issue of whether including administrative variables in nonresponse weighting adjustments enhances nonresponse bias reduction. Table 2 shows estimates of nonresponse bias and absolute relative nonresponse bias (ARNB) for each of the eight administrative variables estimated under the alternative weighting schemes (paradata only versus paradata-administrative data). The rationale for assessing nonresponse bias on the administrative data rather than on the survey data is simply due to the availability of the administrative data for both respondents and nonrespondents and the lack of high-quality benchmark information for the survey data.5 For comparison, nonresponse biases for the unweighted estimates are also shown. We remind readers that these estimates are not based on the full PASS sample, but rather the subset of linked cases. Table 2. Estimates of Means/Percentages and Nonresponse Bias in Weighted and Unweighted Administrative Variables Linked respondents (n = 875) Nonresponse bias % Abs. relative nonresponse bias (ARNB) Administrative variables Linked sample (n =3, 653) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) UBII_2009 (%) 18.07 14.06 18.40 19.48 −4.01 0.33 1.41 22.19 1.83 7.80 EMPLOYED (%) 60.75 62.17 56.85 64.06 1.42 −3.90 3.31 2.34 6.42 5.45 EMP_SPELLS 5.30 5.28 5.28 5.42 −0.02 −0.02 0.12 0.32 0.36 2.25 UNEMP_BEN (%) 57.10 56.91 54.85 52.97 −0.19 −2.25 −4.13 0.33 3.94 7.23 AGE 43.20 45.32 45.58 44.37 2.11 2.37 1.17 4.89 5.49 2.70 FOREIGN (%) 10.54 6.74 6.96 9.05 −3.80 −3.58 −1.49 36.05 33.97 14.14 MALE (%) 52.70 53.71 51.68 51.63 1.01 −1.02 −1.07 1.92 1.94 2.03 WAGE 43.04 45.57 35.82 39.00 2.53 −7.22 −4.04 5.88 16.78 9.38 Average ARNB (%) – – – – – – – 9.24 8.84 6.37 Linked respondents (n = 875) Nonresponse bias % Abs. relative nonresponse bias (ARNB) Administrative variables Linked sample (n =3, 653) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) UBII_2009 (%) 18.07 14.06 18.40 19.48 −4.01 0.33 1.41 22.19 1.83 7.80 EMPLOYED (%) 60.75 62.17 56.85 64.06 1.42 −3.90 3.31 2.34 6.42 5.45 EMP_SPELLS 5.30 5.28 5.28 5.42 −0.02 −0.02 0.12 0.32 0.36 2.25 UNEMP_BEN (%) 57.10 56.91 54.85 52.97 −0.19 −2.25 −4.13 0.33 3.94 7.23 AGE 43.20 45.32 45.58 44.37 2.11 2.37 1.17 4.89 5.49 2.70 FOREIGN (%) 10.54 6.74 6.96 9.05 −3.80 −3.58 −1.49 36.05 33.97 14.14 MALE (%) 52.70 53.71 51.68 51.63 1.01 −1.02 −1.07 1.92 1.94 2.03 WAGE 43.04 45.57 35.82 39.00 2.53 −7.22 −4.04 5.88 16.78 9.38 Average ARNB (%) – – – – – – – 9.24 8.84 6.37 Table 2. Estimates of Means/Percentages and Nonresponse Bias in Weighted and Unweighted Administrative Variables Linked respondents (n = 875) Nonresponse bias % Abs. relative nonresponse bias (ARNB) Administrative variables Linked sample (n =3, 653) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) UBII_2009 (%) 18.07 14.06 18.40 19.48 −4.01 0.33 1.41 22.19 1.83 7.80 EMPLOYED (%) 60.75 62.17 56.85 64.06 1.42 −3.90 3.31 2.34 6.42 5.45 EMP_SPELLS 5.30 5.28 5.28 5.42 −0.02 −0.02 0.12 0.32 0.36 2.25 UNEMP_BEN (%) 57.10 56.91 54.85 52.97 −0.19 −2.25 −4.13 0.33 3.94 7.23 AGE 43.20 45.32 45.58 44.37 2.11 2.37 1.17 4.89 5.49 2.70 FOREIGN (%) 10.54 6.74 6.96 9.05 −3.80 −3.58 −1.49 36.05 33.97 14.14 MALE (%) 52.70 53.71 51.68 51.63 1.01 −1.02 −1.07 1.92 1.94 2.03 WAGE 43.04 45.57 35.82 39.00 2.53 −7.22 −4.04 5.88 16.78 9.38 Average ARNB (%) – – – – – – – 9.24 8.84 6.37 Linked respondents (n = 875) Nonresponse bias % Abs. relative nonresponse bias (ARNB) Administrative variables Linked sample (n =3, 653) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) No weights Weighted (paradata) Weighted (paradata + admin data) UBII_2009 (%) 18.07 14.06 18.40 19.48 −4.01 0.33 1.41 22.19 1.83 7.80 EMPLOYED (%) 60.75 62.17 56.85 64.06 1.42 −3.90 3.31 2.34 6.42 5.45 EMP_SPELLS 5.30 5.28 5.28 5.42 −0.02 −0.02 0.12 0.32 0.36 2.25 UNEMP_BEN (%) 57.10 56.91 54.85 52.97 −0.19 −2.25 −4.13 0.33 3.94 7.23 AGE 43.20 45.32 45.58 44.37 2.11 2.37 1.17 4.89 5.49 2.70 FOREIGN (%) 10.54 6.74 6.96 9.05 −3.80 −3.58 −1.49 36.05 33.97 14.14 MALE (%) 52.70 53.71 51.68 51.63 1.01 −1.02 −1.07 1.92 1.94 2.03 WAGE 43.04 45.57 35.82 39.00 2.53 −7.22 −4.04 5.88 16.78 9.38 Average ARNB (%) – – – – – – – 9.24 8.84 6.37 The table shows, for example, that UB II receipt is underrepresented among the linked respondents: the unweighted percentage is 14.06 percent among the linked respondents and 18.07 percent among the linked respondents and nonrespondents, which yields a nonresponse bias of −4.01 percent and an ARNB of 22.19 percent—a moderately large bias. Both the paradata-only and combined paradata-administrative data weighting schemes succeed in reducing the ARNB to a more reasonable level—1.83 and 7.80 percent, respectively—with the paradata-only weights slightly outperforming the combined data weights. In contrast, the foreign citizenship variable, which has the largest ARNB overall, yields a substantial reduction in ARNB under the combined paradata-administrative data weighting scheme (from 36.05 to 14.14 percent) compared with only a minor reduction under the paradata-only weighting scheme (from 36.05 to 33.97 percent). In total, the combined paradata-administrative data weight outperforms the paradata-only weight for half of the estimates. However, for some of these estimates (e.g., currently employed, average daily wage), both sets of weights increase rather than decrease the level of bias. Overall, the average ARNB for the paradata-administrative data weighted estimates is about 2.5 percentage points smaller than the paradata-only case (6.37 versus 8.84 percent, respectively), which equates to a relative reduction of about twenty-eight percentage points due to the inclusion of the linked administrative data in the weighting process. 4.4 RQ4: Impact of Administrative Data on Weighted Survey Estimates The final analysis examines the impact that the linked administrative data have on the weighted estimates of the actual survey variables. Table 3 shows the weighted and unweighted estimates of the sixteen selected PASS survey variables and corresponding ninety-five percent confidence intervals (CIs). The weights are based on the full set of variables used in models one and two from table 1. It is readily apparent that the paradata-administrative data weighted estimates are similar to the paradata-only weighted estimates. Each of the ninety-five percent CIs overlap, indicating that including the administrative data in the weighting adjustment procedure does not substantially alter the survey estimates. This finding is further supported when examining the absolute relative difference (ARD) between the weighted and unweighted estimates; for the majority of the survey estimates, the ARD values under both weighting schemes differ by less than ten percentage points. Moreover, the average ARD values under each weighting scheme differ only by about five percentage points, suggesting that both weighting schemes have similar impact on the point estimates. Table 3. Estimates of Means/Percentages and Differences Between Weighted and Unweighted Household- and Person-Level Survey Variables Linked respondents (n = 875) Difference between weighted and unweighted estimates % Abs. relative difference (ARD) No weights Weighted (paradata) Weighted (paradata +admin data) Weighted (paradata) Weighted (paradata + admin data) Weighted (paradata) Weighted (paradata + admin data) Household-level variables Household size 2.52 2.05 2.16 −0.47 −0.36 18.76 14.44 (2.38, 2.66) (1.86, 2.24) (1.98, 2.33) Presence of child under 15 years of age (%) 22.53 16.45 18.31 −6.08 −4.22 26.99 18.73 (18.75, 26.83) (12.82, 20.85) (14.40, 22.99) Household ownership (%) 42.51 30.84 29.28 −11.67 −13.23 27.45 31.12 (37.60, 47.57) (24.74, 37.69) (23.47, 35.85) Material deprivation item index 0.71 0.99 0.87 0.28 0.16 39.66 22.50 (0.61, 0.82) (0.57, 1.15) (0.50, 1.25) Material deprivation activity index 2.97 3.21 2.97 0.23 −0.01 7.77 0.20 (2.78, 3.17) (2.68, 3.73) (2.36, 3.57) Net income (in Euros) in past month 2457.31 2038.11 2101.49 −419.20 −355.82 17.06 14.48 (2296.55, 2618.07) (1758.95, 2317.27) (1788.50, 2414.47) Household savings 4.11 3.61 3.67 −0.49 −0.44 11.99 10.69 (3.83, 4.38) (3.11, 4.11) (3.17, 4.16) Received UB II at least once since 2009 (%) 13.29 14.98 16.15 1.69 2.86 12.72 21.52 (10.55, 16.62) (8.95, 24.02) (10.57, 23.91) Person-level variables Age in years 45.11 47.39 43.28 2.28 −1.83 5.05 4.06 (43.84, 46.39) (44.00, 50.78) (40.360, 46.31) Female (%) 46.10 43.65 42.18 −2.45 −3.92 5.31 8.50 (42.33, 49.92) (35.83, 51.81) (34.78, 49.95) Foreign born (%) 13.17 12.46 15.37 −0.71 2.20 5.39 16.70 (9.84, 17.41) (7.67, 19.62) (7.94, 27.66) Currently employed (%) 51.54 39.74 42.10 −11.80 −9.44 22.89 18.32 (47.95, 55.12) (32.01, 48.01) (33.12, 51.63) Total number of employer changes in lifetime 3.16 3.37 3.25 0.22 0.10 6.84 3.04 (2.82, 3.49) (2.80, 3.95) (2.60, 3.91) Ever been registered as unemployed (%) 41.00 43.85 44.18 2.85 3.18 6.95 7.76 (34.81, 47.49) (32.98, 55.33) (32.71, 56.30) Has an officially recognized disability (%) 12.56 18.11 14.14 5.55 1.58 44.19 12.58 (10.19, 15.39) (11.74, 26.88) (9.85, 19.89) Currently receives statutory pension payments (%) 17.56 23.85 16.55 6.29 −1.01 35.82 5.75 (13.87, 21.98) (16.00, 34.00) (11.29, 23.62) Average ARD: Overall (%) – – – – – 18.43 13.15 Linked respondents (n = 875) Difference between weighted and unweighted estimates % Abs. relative difference (ARD) No weights Weighted (paradata) Weighted (paradata +admin data) Weighted (paradata) Weighted (paradata + admin data) Weighted (paradata) Weighted (paradata + admin data) Household-level variables Household size 2.52 2.05 2.16 −0.47 −0.36 18.76 14.44 (2.38, 2.66) (1.86, 2.24) (1.98, 2.33) Presence of child under 15 years of age (%) 22.53 16.45 18.31 −6.08 −4.22 26.99 18.73 (18.75, 26.83) (12.82, 20.85) (14.40, 22.99) Household ownership (%) 42.51 30.84 29.28 −11.67 −13.23 27.45 31.12 (37.60, 47.57) (24.74, 37.69) (23.47, 35.85) Material deprivation item index 0.71 0.99 0.87 0.28 0.16 39.66 22.50 (0.61, 0.82) (0.57, 1.15) (0.50, 1.25) Material deprivation activity index 2.97 3.21 2.97 0.23 −0.01 7.77 0.20 (2.78, 3.17) (2.68, 3.73) (2.36, 3.57) Net income (in Euros) in past month 2457.31 2038.11 2101.49 −419.20 −355.82 17.06 14.48 (2296.55, 2618.07) (1758.95, 2317.27) (1788.50, 2414.47) Household savings 4.11 3.61 3.67 −0.49 −0.44 11.99 10.69 (3.83, 4.38) (3.11, 4.11) (3.17, 4.16) Received UB II at least once since 2009 (%) 13.29 14.98 16.15 1.69 2.86 12.72 21.52 (10.55, 16.62) (8.95, 24.02) (10.57, 23.91) Person-level variables Age in years 45.11 47.39 43.28 2.28 −1.83 5.05 4.06 (43.84, 46.39) (44.00, 50.78) (40.360, 46.31) Female (%) 46.10 43.65 42.18 −2.45 −3.92 5.31 8.50 (42.33, 49.92) (35.83, 51.81) (34.78, 49.95) Foreign born (%) 13.17 12.46 15.37 −0.71 2.20 5.39 16.70 (9.84, 17.41) (7.67, 19.62) (7.94, 27.66) Currently employed (%) 51.54 39.74 42.10 −11.80 −9.44 22.89 18.32 (47.95, 55.12) (32.01, 48.01) (33.12, 51.63) Total number of employer changes in lifetime 3.16 3.37 3.25 0.22 0.10 6.84 3.04 (2.82, 3.49) (2.80, 3.95) (2.60, 3.91) Ever been registered as unemployed (%) 41.00 43.85 44.18 2.85 3.18 6.95 7.76 (34.81, 47.49) (32.98, 55.33) (32.71, 56.30) Has an officially recognized disability (%) 12.56 18.11 14.14 5.55 1.58 44.19 12.58 (10.19, 15.39) (11.74, 26.88) (9.85, 19.89) Currently receives statutory pension payments (%) 17.56 23.85 16.55 6.29 −1.01 35.82 5.75 (13.87, 21.98) (16.00, 34.00) (11.29, 23.62) Average ARD: Overall (%) – – – – – 18.43 13.15 Table 3. Estimates of Means/Percentages and Differences Between Weighted and Unweighted Household- and Person-Level Survey Variables Linked respondents (n = 875) Difference between weighted and unweighted estimates % Abs. relative difference (ARD) No weights Weighted (paradata) Weighted (paradata +admin data) Weighted (paradata) Weighted (paradata + admin data) Weighted (paradata) Weighted (paradata + admin data) Household-level variables Household size 2.52 2.05 2.16 −0.47 −0.36 18.76 14.44 (2.38, 2.66) (1.86, 2.24) (1.98, 2.33) Presence of child under 15 years of age (%) 22.53 16.45 18.31 −6.08 −4.22 26.99 18.73 (18.75, 26.83) (12.82, 20.85) (14.40, 22.99) Household ownership (%) 42.51 30.84 29.28 −11.67 −13.23 27.45 31.12 (37.60, 47.57) (24.74, 37.69) (23.47, 35.85) Material deprivation item index 0.71 0.99 0.87 0.28 0.16 39.66 22.50 (0.61, 0.82) (0.57, 1.15) (0.50, 1.25) Material deprivation activity index 2.97 3.21 2.97 0.23 −0.01 7.77 0.20 (2.78, 3.17) (2.68, 3.73) (2.36, 3.57) Net income (in Euros) in past month 2457.31 2038.11 2101.49 −419.20 −355.82 17.06 14.48 (2296.55, 2618.07) (1758.95, 2317.27) (1788.50, 2414.47) Household savings 4.11 3.61 3.67 −0.49 −0.44 11.99 10.69 (3.83, 4.38) (3.11, 4.11) (3.17, 4.16) Received UB II at least once since 2009 (%) 13.29 14.98 16.15 1.69 2.86 12.72 21.52 (10.55, 16.62) (8.95, 24.02) (10.57, 23.91) Person-level variables Age in years 45.11 47.39 43.28 2.28 −1.83 5.05 4.06 (43.84, 46.39) (44.00, 50.78) (40.360, 46.31) Female (%) 46.10 43.65 42.18 −2.45 −3.92 5.31 8.50 (42.33, 49.92) (35.83, 51.81) (34.78, 49.95) Foreign born (%) 13.17 12.46 15.37 −0.71 2.20 5.39 16.70 (9.84, 17.41) (7.67, 19.62) (7.94, 27.66) Currently employed (%) 51.54 39.74 42.10 −11.80 −9.44 22.89 18.32 (47.95, 55.12) (32.01, 48.01) (33.12, 51.63) Total number of employer changes in lifetime 3.16 3.37 3.25 0.22 0.10 6.84 3.04 (2.82, 3.49) (2.80, 3.95) (2.60, 3.91) Ever been registered as unemployed (%) 41.00 43.85 44.18 2.85 3.18 6.95 7.76 (34.81, 47.49) (32.98, 55.33) (32.71, 56.30) Has an officially recognized disability (%) 12.56 18.11 14.14 5.55 1.58 44.19 12.58 (10.19, 15.39) (11.74, 26.88) (9.85, 19.89) Currently receives statutory pension payments (%) 17.56 23.85 16.55 6.29 −1.01 35.82 5.75 (13.87, 21.98) (16.00, 34.00) (11.29, 23.62) Average ARD: Overall (%) – – – – – 18.43 13.15 Linked respondents (n = 875) Difference between weighted and unweighted estimates % Abs. relative difference (ARD) No weights Weighted (paradata) Weighted (paradata +admin data) Weighted (paradata) Weighted (paradata + admin data) Weighted (paradata) Weighted (paradata + admin data) Household-level variables Household size 2.52 2.05 2.16 −0.47 −0.36 18.76 14.44 (2.38, 2.66) (1.86, 2.24) (1.98, 2.33) Presence of child under 15 years of age (%) 22.53 16.45 18.31 −6.08 −4.22 26.99 18.73 (18.75, 26.83) (12.82, 20.85) (14.40, 22.99) Household ownership (%) 42.51 30.84 29.28 −11.67 −13.23 27.45 31.12 (37.60, 47.57) (24.74, 37.69) (23.47, 35.85) Material deprivation item index 0.71 0.99 0.87 0.28 0.16 39.66 22.50 (0.61, 0.82) (0.57, 1.15) (0.50, 1.25) Material deprivation activity index 2.97 3.21 2.97 0.23 −0.01 7.77 0.20 (2.78, 3.17) (2.68, 3.73) (2.36, 3.57) Net income (in Euros) in past month 2457.31 2038.11 2101.49 −419.20 −355.82 17.06 14.48 (2296.55, 2618.07) (1758.95, 2317.27) (1788.50, 2414.47) Household savings 4.11 3.61 3.67 −0.49 −0.44 11.99 10.69 (3.83, 4.38) (3.11, 4.11) (3.17, 4.16) Received UB II at least once since 2009 (%) 13.29 14.98 16.15 1.69 2.86 12.72 21.52 (10.55, 16.62) (8.95, 24.02) (10.57, 23.91) Person-level variables Age in years 45.11 47.39 43.28 2.28 −1.83 5.05 4.06 (43.84, 46.39) (44.00, 50.78) (40.360, 46.31) Female (%) 46.10 43.65 42.18 −2.45 −3.92 5.31 8.50 (42.33, 49.92) (35.83, 51.81) (34.78, 49.95) Foreign born (%) 13.17 12.46 15.37 −0.71 2.20 5.39 16.70 (9.84, 17.41) (7.67, 19.62) (7.94, 27.66) Currently employed (%) 51.54 39.74 42.10 −11.80 −9.44 22.89 18.32 (47.95, 55.12) (32.01, 48.01) (33.12, 51.63) Total number of employer changes in lifetime 3.16 3.37 3.25 0.22 0.10 6.84 3.04 (2.82, 3.49) (2.80, 3.95) (2.60, 3.91) Ever been registered as unemployed (%) 41.00 43.85 44.18 2.85 3.18 6.95 7.76 (34.81, 47.49) (32.98, 55.33) (32.71, 56.30) Has an officially recognized disability (%) 12.56 18.11 14.14 5.55 1.58 44.19 12.58 (10.19, 15.39) (11.74, 26.88) (9.85, 19.89) Currently receives statutory pension payments (%) 17.56 23.85 16.55 6.29 −1.01 35.82 5.75 (13.87, 21.98) (16.00, 34.00) (11.29, 23.62) Average ARD: Overall (%) – – – – – 18.43 13.15 A limitation of this analysis is the absence of benchmark data to validate whether any shifts in the weighted survey estimates reflect an actual reduction in nonresponse bias. We can only speculate on the possible impact of the weights by gleaning information about the direction of nonresponse bias observed for similarly-measured administrative variables studied in the previous section (section 4.3). In particular, we consider the two variables found to be most affected by nonresponse bias: UB II receipt since 2009 and foreign citizenship. UB II receipt measured in the administrative data was found to be underrepresented among linked respondents, and both sets of weights succeeded in increasing the representation of UB II recipients and reducing nonresponse bias (see table 2). The survey estimate of UB II experiences a similar increase under both weighting schemes, suggesting that both weighting approaches reduce nonresponse bias for this variable. Foreign citizenship, as measured in the administrative data, was also underrepresented among linked respondents, and both weighting approaches reduced nonresponse bias by shifting the estimate upward, with the combined paradata-administrative data weight yielding a much larger bias reduction than the paradata-only weight. A similar picture emerges in the survey data as only the combined-data weighted estimate of foreign born experiences an upward shift relative to the unweighted estimate; thus, using the administrative data in the weighting process may do a better job of reducing nonresponse bias for this particular survey item relative to the standard (paradata-only) weighting approach. In addition to the point estimates, it is also important to consider the impact of the weighting schemes on the variances of the survey estimates. We examined the standard errors and coefficients of variation (CV) for the survey estimates under the alternative weighting schemes (see Appendix table A.4 in the online supplementary material). In general, only minor CV differences exist between the two weighting approaches. The combined paradata-administrative data weight succeeds in reducing the CV over the paradata-only weight for half of the estimates. Little and Vartivarian (2005) concluded that the variance of an estimate should decrease when the weighting variables are highly correlated with the variable of interest. This appears to be the case for the UB II receipt item, which is highly correlated (0.77) with its administrative counterpart. This item experiences the largest CV reduction after incorporating the administrative variables into the weighting process (from 0.25 to 0.20). Thus, the administrative variables appear to reduce both nonresponse bias and variance for this survey item. The foreign citizenship item, on the other hand, which has a relatively weaker correlation with its administrative counterpart (0.43), is affected by an increased CV (from 0.23 to 0.31) when the administrative variables are used in the weighting procedure. Hence, the speculative reduction in nonresponse bias for this item is not accompanied by a reduction in variability under the enhanced weighting scheme. 5. DISCUSSION The results of this case study can be distilled into five main findings. First, although the linked administrative variables were only weakly associated with the majority of substantive survey variables, there were several pairs of variables between the two data sources which produced moderate-to-high correlations. Second, correlations with the substantive survey variables were generally higher for the linked administrative variables than for the process-oriented paradata variables—a finding which held even after excluding the similarly measured survey-administrative variable pairs from the comparison. However, both paradata and administrative data were poorly associated with survey participation. Third, adding linked administrative variables to the response propensity model did not substantially improve model fit relative to the paradata-only response model, and only few administrative variables were statistically significant predictors of response. Fourth, despite weak associations with the response outcome, incorporating the linked administrative variables into the nonresponse adjustment procedure reduced the average relative nonresponse bias to a greater extent than the paradata-only adjustment procedure. Lastly, utilizing the administrative data did not substantially impact the survey-weighted estimates and their variances compared with the paradata-only weighted estimates; however, there was an indication that the administrative variables reduced nonresponse bias and variance for a particular survey item (UB II benefit receipt) that was highly correlated with the administrative data—a result that is consistent with the empirical findings reported in Little and Vartivarian (2005). It is common practice to use federal administrative records in survey research, but they are significantly underutilized as a source of auxiliary data for addressing nonresponse bias in general population samples, particularly in applications where direct linkages are not possible. In light of this situation, it is interesting to know that indirectly linking federal administrative data to survey samples can yield benefits in terms of assessing and reducing nonresponse bias for the linked cases. The finding that linked administrative data outperforms process-oriented paradata in terms of their associations with the survey variables and in reducing nonresponse bias is a useful step forward in the search for powerful auxiliary information capable of combating the effects of nonresponse. While indirectly linked federal administrative records may offer potential benefits for addressing nonresponse bias, there are a number of technical issues that warrant attention before the procedure can be used in a production environment. For example, inevitably there will be sample cases that cannot be linked to the administrative database, as was observed in the present study and in other studies involving auxiliary data linkages (Raghunathan and Van Howeyk 2008; DiSogra, Dennis, and Fahimi 2010; Pasek et al. 2014; Sinibaldi, Kreuter, and Trappmann 2014; West, Wagner, Hubbard, and Hu 2015). As such, one cannot make definitive conclusions about nonresponse bias in the survey as a whole based only on the subset of cases that can be linked. It is possible to increase the linkage rate by relaxing the linkage criterion, but this strategy introduces a trade-off as it may increase the likelihood of false-positive links and weaken associations with the substantive survey variables (see footnote 4), thus, diminishing the utility of the linked data and possibly rendering the linkage exercise moot. It is also important to note that the PASS-administrative data linkage was conducted “in-house” at the IAB, where both the PASS survey and IAB employment database are housed. This situation simplified the linkage process immensely both from a technical and legal standpoint. Conducting linkages with survey-administrative data sources belonging to different agencies poses additional challenges, including the need to obtain approvals from various stakeholders and reach agreement on data sharing procedures. While not insurmountable, such negotiations are not straightforward and may carry on longer than expected. Large-scale surveys that currently perform linkages with respondent interview data may be at an advantage of already having an established relationship with the administrative data sponsor, potentially simplifying the process of extending the linkages to include the noninterviewed cases for nonresponse analysis. Assuming these issues can be overcome, we recommend that indirectly linked administrative data be considered as a supplement to existing auxiliary data options for addressing nonresponse bias. Despite modest reductions in nonresponse bias and minimal impact on the weighted survey estimates and corresponding variances, our case study points to potentially strong associations between federal administrative data and substantive survey variables that could be leveraged to address more severe cases of nonresponse bias. We envision multiple ways in which these properties could be harnessed in future work, including studying prospective nonresponse bias in longitudinal surveys, monitoring sample representativeness during fieldwork, and informing data collection interventions in a responsive design framework. Supplementary Materials Supplementary materials are available online at academic.oup.com/jssam. Footnotes 1 These subgroups comprise roughly 12.5 percent of the total population aged fifteen to sixty-five based on figures from the 2011 Microcensus data obtained from the Federal Statistical Office. 2 See www.record-linkage.de for more details on the German Record Linkage Center and the Merge ToolBox software. 3 Despite our relatively comprehensive list of non-unique identifiers, false-positive or one-to-many links are still a possibility. This can be due to missing, incorrect, imprecise, or (inconsistently) abbreviated names and addresses in either of the databases. One linkage variable, in particular, that contributed to several false-positive links as reported in the linkage evaluation by Sakshaug, Antoni, and Sauckel (2017) was the dichotomous indicator of birth cohort. This broadly defined indicator of age yielded several false links and one-to-many links with persons within the same household but from a different generation than the sampled person. The authors reported that increasing the strictness of a link (based on the MCI scale) substantially reduced the age discrepancies for the linked cases (p. 69). 4 We also assessed these correlations under a less strict linkage criteria (i.e., using MCI values between one and twelve). In general, we found that gradually lowering the linkage threshold monotonically dampens the correlations between the survey and administrative variables. This result was not unexpected given that the rate of false-positive links was found to be higher for the smaller MCI values as reported in Sakshaug, Antoni, and Sauckel (2017). 5 To avoid the circular use of the same administrative variable used in both the weight generation and survey estimation processes, we remove the target administrative variable from the logistic regression model used to estimate the response propensity scores and resulting weights. This variable removal procedure is performed separately for each target administrative variable used in the survey estimation. For example, the UBII_2009 administrative variable was dropped from the logistic regression model during the process of estimating nonresponse bias for this variable in table 2. REFERENCES The American Association for Public Opinion Research (AAPOR). 2016 , Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys, (9th ed.), AAPOR, available at https://www.aapor.org/AAPOR_Main/media/publications/Standard-Definitions20169theditionfinal.pdf. (Accessed March 24, 2018). Antoni M. , Ganzer A. , vom Berge P. ( 2016 ), “Sample of Integrated Labour Market Biographies (SIAB) 1975-2014,” FDZ-Datenreport, 04/2016 (en), Nuremberg. Antoni M. , Bethmann A. ( 2018 ), “ PASS-ADIAB – Linked Survey and Administrative Data for Research on Unemployment and Poverty, ” Journal of Economics and Statistics [online], DOI: https://doi.org/10.1515/jbnst-2018-0002. Available at https://www.degruyter.com/view/j/jbnst.ahead-of-print/jbnst-2018-0002/jbnst-2018-0002.xml. Baumgarten D. ( 2013 ), “Exporters and the Rise in Wage Inequality: Evidence from German Linked Employer–Employee Data,” Journal of International Economics , 90 , 201 – 217 . Google Scholar CrossRef Search ADS Bee C. A. , Gathright G. M. R. , Meyer B. D. ( 2015 ), “Bias from Unit Non-Response in the Measurement of Income in Household Surveys,” paper presented at the Joint Statistical Meetings of the American Statistical Association, Seattle, WA, available at http://www.sole-jole.org/16068.pdf. (Accessed March 24, 2018). Berg M. , Cramer R. , Dickmann D. , Gilberg R. , Jesske B. , Kleudgen M. , Bethmann A. , Fuchs B. , Trappmann M. , Wurdack A. ( 2012 ), “Codebook and Documentation of the Panel Study ‘Labour Market and Social Security’ (PASS) Wave 5,” FDZ-Datenreport, 06/2012 (en), Institute for Employment Research, Nuremberg. Blom E. , Carlsson F. ( 1999 ), “Registers in Official Statistics: A Swedish Perspective,” Invited paper for the Joint/ECE/Eurostat Work Session on Registers and Administrative Records for Social and Demographic Statistics, Geneva, available at http://www.unece.org/fileadmin/DAM/stats/documents/1999/03/registers/14.e.pdf. (Accessed March 24, 2018). Boockmann B. , Ammermüller A. , Zwick T. , Maier M. ( 2007 ), “Do Hiring Subsidies Reduce Unemployment among the Elderly? Evidence from Two Natural Experiments,” ZEW Discussion Papers No. 07-001. Brick J. M. , Kalton G. ( 1996 ), “Handling Missing Data in Survey Research,” Statistical Methods in Medical Research , 5 , 215 – 238 . Google Scholar CrossRef Search ADS PubMed Burr H. , Rauch A. , Rose U. , Tisch A. , Tophoven S. ( 2015 ), “Employment Status, Working Conditions and Depressive Symptoms among German Employees Born in 1959 and 1965,” International Archives of Occupational and Environmental Health , 88 , 731 – 741 . Google Scholar CrossRef Search ADS PubMed Diez Roux A. V. ( 2001 ), “Investigating Neighborhood and Area Effects on Health,” American Journal of Public Health , 91 , 1783 – 1789 . Google Scholar CrossRef Search ADS PubMed DiSogra C. , Dennis J. M. , Fahimi M. ( 2010 ), “On the Quality of Ancillary Data Available for Address-Based Sampling,” paper presented at the Joint Statistical Meetings of the American Statistical Association, Vancouver, British Columbia, available at http://www.knowledgenetworks.com/ganp/docs/jsm2010/On-the-Quality-of-Ancillary-ABS-2010-JSM-submission.pdf. (Accessed March 24, 2018). Federal Data Protection Act ( 2013 ), “Admissibility of Data Collection, Processing and Use,” Bundesministerium der Justiz und für Verbraucherschutz, available at http://www.gesetze-im-internet.de/englisch_bdsg/index.html. (Accessed March 24, 2018). Freedman V. A. , McGonagle K. , Andreski P. ( 2014 ), “The Panel Study of Income Dynamics Linked Medicare Claims Data,” PSID Technical Report Series 14-01, University of Michigan. Groves R. M. ( 2006 ), “Nonresponse Rates and Nonresponse Bias in Household Surveys,” Public Opinion Quarterly , 70 , 646 – 675 . Google Scholar CrossRef Search ADS Jacobebbinghaus P. , Seth S. ( 2007 ), “The German Integrated Employment Biographies Sample IEBS,” Schmollers Jahrbuch: Journal of Applied Social Science Studies , 127 , 335 – 342 . Knies G. , Burton J. ( 2014 ), “Analysis of Four Studies in a Comparative Framework Reveals: Health Linkage Consent Rates on British Cohort Studies Higher than on UK Household Panel Surveys,” BMC Medical Research Methodology , 14 , 125 . Google Scholar CrossRef Search ADS PubMed Korbmacher J. , Czaplicki C. ( 2013 ), “Linking SHARE Survey Data with Administrative Records: First Experiences from SHARE-Germany,” in SHARE Wave 4: Innovations & Methodology , eds. Malter F. , Börsch-Supan A. , p. 4753, Munich : MEA, Max Planck Institute for Social Law and Social Policy . Kreuter F. ( 2013 ), Improving Surveys with Paradata: Analytic Uses of Process Information , Hoboken, NJ : Wiley . Google Scholar CrossRef Search ADS Kreuter F. , Kohler U. ( 2009 ), “Analyzing Contact Sequences in Call Record Data: Potential and Limitations of Sequence Indicators for Nonresponse Adjustments in the European Social Survey,” Journal of Official Statistics , 25 , 203 – 226 . Kreuter F. , Müller G. , Trappmann M. ( 2010 ), “Nonresponse and Measurement Error in Employment Research: Making Use of Administrative Data,” Public Opinion Quarterly , 74 , 880 – 906 . Google Scholar CrossRef Search ADS Kreuter F. , Olson K. , Wagner J. , Yan T. , Ezzati‐Rice T. M. , Casas‐Cordero C. , Lemay M. , Peytchev A. , Groves R. M. , Raghunathan T. E. ( 2010 ), “Using Proxy Measures and Other Correlates of Survey Outcomes to Adjust for Nonresponse: Examples from Multiple Surveys,” Journal of the Royal Statistical Society: Series A (Statistics in Society) , 173 , 389 – 407 . Google Scholar CrossRef Search ADS Lin I. , Schaeffer N. C. ( 1995 ), “Using Survey Participants to Estimate the Impact of Nonparticipation,” Public Opinion Quarterly , 59 , 236 – 258 . Google Scholar CrossRef Search ADS Little R. J. A. , Vartivarian S. ( 2005 ), “Does Weighting for Nonresponse Increase the Variance of Survey Means?” Survey Methodology , 31 , 161 – 168 . Möller J. , Walwei U. ( 2009 ), “Editorial,” in Aktivierung, Erwerbstätigkeit und Teilhabe. Vier Jahre Grundsicherung für Arbeitsuchende , eds. Koch S. , Kupka P. , Steinke J. , p. 1112, Bielefeld : IAB-Bibliothek 315. Mostafa T. ( 2016 ), “Variation within Households in Consent to Link Survey Data to Administrative Records: Evidence from the UK Millennium Cohort Study,” International Journal of Social Research Methodology , 19 , 355 – 375 . Google Scholar CrossRef Search ADS Olson J. A. ( 1999 ), “Linkages with Data from Social Security Administrative Records in the Health and Retirement Study,” Social Security Bulletin , 62 , 73 – 85 . Olson K. ( 2013 ), “Paradata for Nonresponse Adjustment,” The Annals of the American Academy of Political and Social Science , 645 , 142 – 170 . Google Scholar CrossRef Search ADS Pasek J. , Jang S. M. , Cobb C. L. III , Dennis J. M. , DiSogra C. ( 2014 ), “Can Marketing Data Aid Survey Research? Examining Accuracy and Completeness in Consumer-File Data,” Public Opinion Quarterly , 78 , 889 – 916 . Google Scholar CrossRef Search ADS Peytchev A. , Olson K. ( 2007 ), “Using Interviewer Observations to Improve Nonresponse Adjustments: NES 2004,” paper presented at the Joint Statistical Meetings of the American Statistical Association, Salt Lake City, UT, available at https://digitalcommons.unl.edu/sociologyfacpub/145/. (Accessed March 24, 2018). Raghunathan T. E. , Van Howeyk J. ( 2008 ), Disclosure Risk Assessment for Survey Microdata , unpublished manuscript, University of Michigan . Sakshaug J. W. , Antoni M. ( 2017 ), “Errors in Linking Survey and Administrative Data,” in Total Survey Error in Practice , eds. Biemer P. , Eckman S. , Edwards B. , de Leeuw E. , Kreuter F. , Lyberg L. , Tucker C. , West B. , pp. 557 – 573, Hoboken, NJ : John Wiley and Sons . Google Scholar CrossRef Search ADS Sakshaug J. W. , Antoni M. , Sauckel R. ( 2017 ), “The Quality and Selectivity of Linking Federal Administrative Records to Respondents and Nonrespondents in a General Population Survey in Germany,” Survey Research Methods , 1 , 63 – 80 . Sakshaug J. W. , Kreuter F. ( 2011 ), “Using Paradata and Other Auxiliary Data to Examine Mode Switch Nonresponse in a ‘Recruit-and-Switch’ Telephone Survey,” Journal of Official Statistics , 27 , 339 – 357 . Schouten B. , Shlomo N. , Skinner C. ( 2011 ), “Indicators for Monitoring and Improving Representativeness of Response,” Journal of Official Statistics , 27 , 1 – 24 . Sinibaldi J. , Trappmann M. , Kreuter F. ( 2014 ), “Which Is the Better Investment for Nonresponse Adjustment: Purchasing Commercial Auxiliary Data or Collecting Interviewer Observations?” Public Opinion Quarterly , 78 , 440 – 473 . Google Scholar CrossRef Search ADS Smith T. W. ( 2011 ), “The Report of the International Workshop on Using Multi-Level Data from Sample Frames, Auxiliary Databases, Paradata and Related Sources to Detect and Adjust for Nonresponse Bias in Surveys,” International Journal of Public Opinion Research , 23 , 389 – 402 . Google Scholar CrossRef Search ADS Smith T. W. , Kim J. ( 2013 ), “An Assessment of the Multi-Level Integrated Database Approach,” The Annals of the American Academy of Political and Social Science , 645 , 185 – 221 . Google Scholar CrossRef Search ADS Trappmann M. , Beste J. , Bethmann A. , Müller G. ( 2013 ), “The PASS Survey after Six Waves,” Journal of Labour Market Research , 46 , 275 – 281 . Google Scholar CrossRef Search ADS United Nations Economic Commission for Europe (UNECE) , 2007 . “Register-Based Statistics in the Nordic Countries: Review of Best Practices with Focus on Population and Social Statistics,” Technical Report E.07.II.E.11, United Nations, New York/Geneva, available at http://www.unece.org/fileadmin/DAM/stats/publications/Register_based_statistics_in_Nordic_countries.pdf. (Accessed March 24, 2018). Wagner J. ( 2012 ), “A Comparison of Alternative Indicators for the Risk of Nonresponse Bias,” Public Opinion Quarterly , 76 , 555 – 575 . Google Scholar CrossRef Search ADS Wallgren A. , Wallgren B. ( 2007 ), Register-Based Statistics: Administrative Data for Statistical Purposes , West Sussex, England : Wiley . West B. T. , Kreuter F. , Trappmann M. ( 2014 ), “Is the Collection of Interviewer Observations Worthwhile in an Economic Panel Survey? New Evidence from the German Labor Market and Social Security (PASS) Study,” Journal of Survey Statistics and Methodology , 2 , 159 – 181 . Google Scholar CrossRef Search ADS West B. T. , Wagner J. , Hubbard F. , Gu H. ( 2015 ), “The Utility of Alternative Commercial Data Sources for Survey Operations and Estimation: Evidence from the National Survey of Family Growth,” Journal of Survey Statistics and Methodology , 3 , 240 – 264 . Google Scholar CrossRef Search ADS © The Author(s) 2018. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Journal

Journal of Survey Statistics and MethodologyOxford University Press

Published: May 22, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off