Sequential Hierarchical Regression Imputation

Sequential Hierarchical Regression Imputation Abstract Skip patterns, bounds, and diverse measurement scales often exacerbate the problem of item nonresponse in the analysis of survey data. Sequential, or variable-by-variable imputation techniques have been quite successfully applied to overcome such problems. Most of these techniques have so far focused on relatively simple designs, and studies have demonstrated the consistency of these methods with techniques that draw from a joint posterior predictive distribution of missing data. Here we consider a sequential imputation technique based on a family of hierarchical regression models, extending the sequential approach to correlated data, (e.g., clustered data) and assess its performance. Each of the regression models is tailored to the variable being handled. Computational techniques used to approximate the posterior predictive distributions are based on Markov Chain Monte Carlo (MCMC) and numerical integration to overcome the problem of intractability. We present a simulation study assessing the compatibility of this approach with the joint data generation mechanism. In the scenarios studied, the sequential method leads to well-calibrated estimates and often performs better than methods that are currently available to practitioners. 1. INTRODUCTION Missing data is one of the most pervasive problems in data analyses. Since the seminal work by Rubin (1987), survey analysts and methodologists have been practicing multiple imputation (MI), a simulation-based approach to facilitating inference in data sets with incomplete values. Under the MI paradigm, the posterior predictive distribution of the missing data is simulated to replace the missing data with pseudo-random values, typically I > 1 times. Each of the I imputed data sets is analyzed in the same fashion by a complete-data method, and the results are then combined by following Rubin’s rules (Rubin 1987). The process of creating imputations, analyzing the imputed data sets, and combining the results is a Monte Carlo version of averaging the statistical results over the predictive distribution of the missing data. MI can be highly efficient even for small I, and one good set of imputations can be used for many different analytic problems. Under model-based MI, one must impose a probability model on the complete data (observed and missing values), along with a model or assumptions about the mechanism causing missing data. The models are then used to form the posterior predictive distribution of the missing data, from which the imputations are sampled. In general, there are two approaches to forming an imputation model: joint modeling (Schafer 1997; Little and Rubin 2002) and sequential modeling (van Buuren and Groothuis-Oudshoorn 2011, Raghunathan, Lepkowski, VanHoewyk, and Solenberger 2001; and others as referenced below). Joint imputation modeling can be preferable if the problem consists of a relatively small number of items, and it often uses Markov Chain Monte Carlo (MCMC)–type algorithms for drawing missing values from their “joint” posterior predictive distribution. The analytical complexity of the problem is often heightened in cases where missingness occurs for many variables with diverse measurement scales. Suitable computational tools allowing researchers to conduct MI have centered on the notion of sequential, or variable-by-variable imputation (also known as chained, or iterative imputation). The concept of sequential imputation assumes a set of conditional regression models appropriately specified for the variables to be imputed. Continuous variables are typically modeled using linear regression models, binary variables are modeled using logit/probit regression models, and so on. The specific steps of sequential imputation are given in section 2. Sequential imputation techniques have been implemented in many software packages. For example, Raghunathan, Lepkowski, VanHoewyk, and Solenberger (2001) developed a SAS macro, “IveWare,” which performs numerous computations; one of the main functionalities is variable-by-variable imputation using a sequence of regression models suitable for continuous, categorical, and semicontinuous variables. An R package, “MICE,” by van Buuren and Groothis-Oudshorn (2011) creates multiple imputations by fully conditional specification with options for drawing from the underlying conditional predictive distributions, including predictive-mean and propensity-score matching. A similar methodology has also been implemented in STATA in the procedure “ICE” by Royston and White (2011). Sequential imputation methods can handle high dimensional data with diverse measurement scales more easily than methods drawing imputations from joint models. An added advantage pertains to skip patterns and bounds, as are common in many surveys. As variables are modeled in a sequential fashion, including such features in each model with just one outcome can more easily be pursued. These practical aspects have proven to be the main factors contributing to the popularity of sequential approaches. It has been noted (Little and Rubin 2002) that these approaches do not necessarily imply a coherent joint imputation model, and hence might not produce proper multiple imputations (Rubin 1987). However, in many complex situations, they have been shown to work well. Their theoretical and empirical aspects have been investigated by several authors. Liu, Gelman, Hill, Su, and Kropko (2014) theoretically characterize their operational and statistical properties and provide conditions for imputations to be proper. These conditions are such that if the conditional distributions are compatible, then the multiple imputations are able to capture the variation in the underlying posterior distributions. If not, MI estimators using Rubin’s (1987) prescription are still consistent. Other work has also appeared in the literature. Lee and Carlin (2010) and Kropko, Goodrich, Gelman, and Hill (2014) investigate the performance of the two approaches (joint versus sequential) in imputation of continuous and categorical variables. In both studies, the two approaches lead to similar inferential conclusions. Kropko et al. (2014) report that the sequential approach leads to more accurate results in settings with categorical variables. Most implementations of sequential imputation methods so far have made use of fixed-effects models. When the underlying design (e.g., longitudinal or two-stage sampling) leads to clustered data (figure 1), these methods have often accounted for clustering by including indicators of clusters as auxiliary information in the individual fixed-effects models. However, as discussed by Andridge (2011) in settings underlying cluster randomized trials, this approach leads to overestimation of intraclass correlations, resulting in positively biased MI variance estimators. More recently, van Buuren and Groothis-Oudshorn (2011) modified the R package MICE to include random effects for the continuous variables via linear mixed-effects models; see also van Buuren (2012). More recently, Resche-Rigon and White (2016) studied a similar algorithm in more detail and proposed new imputation methods, based on variable-by-variable algorithms, for multilevel normal data. However, development of similar models for categorical variables has not emerged within the context of variable-by-variable imputation. A joint modeling approach appropriate for clustered data has been developed by multivariate extensions of linear mixed-effects models by Schafer and Yucel (2002); this approach has also been implemented as an R package, “PAN.” Handling categorical variables with this approach is only possible by continuous approximations to the variables, which can fail in certain instances (Yucel, He and Zaslavsky 2008, 2011). In another implementation, “REALCOM” within MLWin, Carpenter, Goldstein, and Kenward (2011) make use of MCMC techniques to draw multiple imputations in multilevel applications for mixtures of variables, based on a joint model with normal latent variables underlying the categorical variables. REALCOM can be prone to computational problems as the number of categorical variables increases. In its implementation as an R package, “jomo” (Quartagno and Carpenter 2016), this computational aspect has been significantly improved. As mentioned previously, however, such a joint modeling approach can be difficult or infeasible when the data have skip patterns and/or bounds. Figure 1. View largeDownload slide Clustered Incomplete Data with Skip Patterns. Figure 1. View largeDownload slide Clustered Incomplete Data with Skip Patterns. In this paper, we introduce a different approach to incorporating random effects for discrete outcomes in imputation for missing data of mixed types. Our method uses sequential imputation with a normal linear mixed model for each continuous outcome and a generalized linear mixed model for each discrete outcome. We refer to the approach as sequential hierarchical regression imputation, or “SHRIMP.” We investigate, via simulation, the potential gains from (i) fully incorporating random effects to represent clustering and (ii) modeling discrete outcomes using models that are more appropriate for them. Although variable-by-variable methods are most useful with high-dimensional data, several different measurement scales, skip patterns, and bounds, we consider a simple case involving a combination of continuous and binary variables, both for computational simplicity and to facilitate comparisons with methods that are not designed to handle such complexities. Section 2 of the paper outlines our variable-by-variable imputation approach. Models used to impute missing values in the individual variables are stated, and computational methods approximating the posterior predictive distribution under each of these models are also discussed. Section 3 discusses the simulation study, which assesses the performance of SHRIMP and contrasts SHRIMP with the PAN, MICE, and (briefly) jomo approaches. Finally, section 4 discusses the strengths and limitations of our approach and areas for further research. 2. SEQUENTIAL IMPUTATION Let Z denote a data matrix containing completely observed auxiliary variables and Yk (k = 1,2,…, r) denote a set of incompletely observed continuous or categorical variables. Without loss of generality, we assume that these variables are ordered with respect to amount of missing values, from least to most. We model each variable Yk sequentially using an appropriate regression model. The resulting set of regression models are the basis for approximating the posterior predictive distribution. Consider, for example, a survey where income, education, and race are incompletely observed. The sequential imputation process models each of these variables with an appropriate regression model using all other variables, subject to missingness or not. The process samples the missing values for each variable at a time, using the other, completed or fully observed, variables as covariates. Sections 2.1 and 2.2 below explain how this proceeds computationally. SHRIMP enhances sequential imputation approaches by applying mixed-effects models to handle clustering either due to design (multistage sampling or longitudinal) or a naturally occurring hierarchy (e.g., patients within clinics or hospitals). Thus, distinct sources of variation are incorporated at the imputation stage. The mixed-effects models are designed to be congenial with survey inferences, which are often based only on cluster-level (e.g., PSU-level) variability (Meng 1994; Reiter, Raghunathan, and Kinney 2006). As in previous variable-by-variable implementations, SHRIMP proceeds in a cyclical manner, where in each cycle the conditional posterior predictive distribution derived under the corresponding mixed-effects model is sampled. We denote these distributions by fYkYkZ,Y-k,θYk), where Y−k = {Yj:j≠k} denotes all the variables other than Yk, Z denotes fully observed variables, and θYk denotes the unknown parameters of the conditional distribution. Each of these conditionals is modeled by an appropriate regression model (see section 2.1). Missing values are drawn from their conditional posterior predictive distributions given the observed and imputed values (see section 2.2). For additional computational details, see Appendix A. One set of imputations for the missing values is drawn after a predetermined number c of cycles, which are assumed to yield an adequate approximation to the marginal posterior predictive distribution of the missing data. In the first cycle, starting from the variable with the least missing values, we regress each variable Yk on all other completely observed variables and impute the missing values. After this initial cycle imputes all missing values, the second cycle draws the imputed values for each variable with missingness based on the regression model specified for the variable, with the other Z and Y variables as covariates. Missing values are sampled using approximate conditional posterior predictive distributions. As each variable is imputed, the previous imputed values are overwritten with the new ones. After c cycles are completed, we take the final imputed values as the set of imputations. Multiple imputations are created by repeating the c-cycle imputation process multiple times. Inferences from the multiply imputed data are conducted following the MI combining rules (Rubin 1987). Assuming a good approximation to the posterior predictive distribution of the missing data, those combining rules will lead to inferences that have the desired properties (e.g., unbiasedness, nominal coverage, etc.) for a sufficiently large sample size and number of imputations (Rubin 1987; Rubin and Schenker 1986; Reiter and Raghunathan 2007). Postulating a model for imputing from the implied posterior predictive distribution of the missing data is the most crucial stage of the imputation process. This model should be capable of preserving design features and relationships among incomplete and complete variables that may be subject to current/future analyses. Further, to the extent possible, the model should be rich enough to account for causes of missingness to increase the plausibility of an underlying missing at random (MAR) assumption. The naming convention for missingness mechanisms is due to Rubin (1976), and MAR refers to the missingness probabilities depending on the observed data (covariate and/or response) but not on the missing data. See Schafer (1997) and Little and Rubin (2002) for more discussion of imputation models as well as missingness mechanisms. 2.1 Conditional Regression Models Below we describe the models used to impute individual variables conditional on covariates. Here we only consider models appropriate for single-level clustering. Extensions to handle more nested and/or non-nested levels of clustering are straightforward. However, care should be taken to avoid computational complexities due to intractability and lack of convergence (see Appendix A for the computational algorithm for simulating the posterior predictive distribution under a logistic mixed-effects regression model). Let yij denote a value of a random variable Y (continuous, categorical or count) for subject j = 1,2, …, ni in cluster i = 1,2, …, m. We assume the link function:   g(Eyijxij,wi,θ))=ηij, (1) where g(•) denotes a function linking the expected value of response yij to the linear predictor   ηij=xijβ+wibi, (2) and xij and wi are vectors of covariates corresponding to individual- and cluster-specific characteristics, β and bi are the corresponding regression coefficients (population-average fixed effects and cluster-specific random effects, respectively), and θ denotes the unknown parameters such as regression coefficients and variance components governing the underlying model. In the remainder of this paper, we assume a random-intercept-only model, so that wi ≡ 1 and bi are scalars, with bi∼N(0,σb2) independently for i = 1, 2,… , m. In applications with clustered survey data, it is often sufficient to assume a random-intercept model to account for cluster-specific variation. In other applications such as longitudinal studies, it may be necessary to include more structures such as a random slope to reflect variation in the slopes across subjects. The computational algorithms presented here pertain to only binary and continuous variables. Other models are straightforward extensions that follow the same computational logic and are discussed in section 4. For binary yij, we use   gPyij=1xij,wi,θ)=log⁡Pyij=1xij,wi,θ)1-Pyij=1xij,wi,θ), (3) and for continuous yij, we use   g(Eyijxij,wi,θ))=E(yijxij,wi,θ. (4) 2.2 Computational Algorithms SHRIMP samples from the conditional posterior predictive distributions of the variables sequentially. These distributions are derived under the imposed regression models. For a binary variable, for example, a logistic regression mixed-effects model is used. As the implied posterior predictive distribution is not tractable, we employ approximations that make use of weak prior distributions and maximum likelihood estimates and information around them. The use of priors helps to incorporate uncertainty about the parameters. Below we describe the algorithms for linear and logistic mixed-effects regression models. 2.2.1 Linear Mixed-Effects Regression Models Corresponding to (1), (2), and (4), our first imputation model, for a continuous variable yij, is a linear, mixed-effects, random-intercept-only (wi ≡1) model:   yij=xijβ+bi+ɛij, where xij, the individual-level covariates, are either completely observed or imputed in the previous cycles of SHRIMP; and β and bi are the fixed and random effects, respectively. Following standard practice, we assume normal distributions for the error terms and random effects: ɛij∼N(0,σe2) and bi∼N(0,σb2) independently for i = 1, 2,… , m and j = 1, 2,… , ni. A Gibbs sampling algorithm is employed to simulate the joint posterior of β, b1,… , bm, σb, σe and the missing data (ymis). To proceed with the Gibbs sampler, conditional posterior distributions for θ=(β,σb,σe)and b1,… , bm need to be derived. We assume inverse chi-square priors χνb-2 and χνe-2 for the variances σb2 and σe2, respectively, and improper uniform priors on the fixed effects. An algorithm similar to that given by Schafer and Yucel (2002) is adopted in our setting and proceeds sampling iteratively using   β|x,w, b1,…,bm,σe2,σb2∼Nβ^, Vβ^, (5)  σe2|x,w,b1,…,bm,β,σb2∼1+∑i,j^ɛij2χνe+N-1-2,  (6)  bi|x,w,β,σe2,σb2∼N(b^i,Vb^i), (7)  σb2|x,w,b1,…,bm,β,σe2∼νb+∑ibi2νb+mχνb+m-2, (8) where x and w denote the full sets of covariates (with wi ≡ 1),   Vβ^= σe2∑i,jxijTxij-1, β^=Vβ^σe2∑i,jxijT(yij-bi), ɛij^=yij-xijβ^-wibi,  N=∑i=1mni,Vb^i=niσe2+1σb2-1,b^i=Vb^iσe2∑j=1ni(yij-xijβ). For a given set of values β, σb, σe, b1, …, bm, each yij ∈ ymis can be drawn from   yij|x,w,β,σe2,σb2,b1,…,bm∼Nxijβ+bi,σb2+σe2. (9) The steps given by (5) to (9) define one cycle of the Gibbs sampler, and executing such cycles repeatedly creates sequences of the unknown parameters {θ(1), θ(2), …} and {ymis(1), ymis(2), …}, whose limiting distributions are P(θ | yobs, x, w) and P(ymis |yobs, x, w), respectively, where yobs denotes the observed Y-values. We use a draw of ymis as a single random sample from the posterior predictive distribution of missing data upon convergence. We assess convergence by autocorrelation function plots and Gelman-Rubin statistics. 2.2.2 Logistic Mixed-Effects Regression Models For a binary variable yij, based on the observed values, we first compute the maximum likelihood estimates of the parameters under the logistic regression model corresponding to (1) to (3). Then we use these estimates along with the information matrix to develop an approximation to the underlying posterior distribution of the unknown parameters and the posterior predictive distribution of the missing data. Specifically, we work with the following likelihood function   L(θ, b1,…,bm|y, x, w) = ∏i=1m∏j=1niFηijyij(1-Fηij)1-yij, (10) where θ=(β,σb), y is the set of Y-values, and Fη=(1+e-η)-1. Our algorithm requires maximizing the marginal likelihood function   L(θ|y,x,w)=∫bL(θ|y,x,w,b)fbdb, (11) where f(b) represents the probability distribution for the vector b of random effects, the components of which we assume to be independently distributed as N (0, σb2 ). Because there is no closed-form solution to this maximization problem, we employ Gaussian-Hermite quadrature (Abramowitz and Stegun 1972), which is fast and accurate for a random-intercept-only model (Raudenbush, Yang, and Yosef 2000; Hedeker and Gibbons 1994). (See Appendix A for computational details.) A Laplace approximation might be preferable for a richer model such as a random-slope model. 3. SIMULATION ASSESSMENT By its setup, sequential or conditional sampling does not necessarily imply a random sample from a joint posterior predictive distribution of the missing data (Raghunathan, Lepkowski, VanHoewyk, and Solenberger 2001; Little and Rubin 2002; van Buuren, Brand, Groothuis-Oudshoorn, and Rubin 2006; van Buuren 2012). Our simulation study assesses the performance of SHRIMP in terms of frequentist properties. Further, the study compares SHRIMP with PAN, which indeed samples from a joint posterior predictive distribution but uses a continuous (normal) approximation for discrete variables to do so. The study also compares SHRIMP with MICE, which uses sequential regression imputation but only incorporates random effects for the continuous variables and not the discrete ones. The framework of the simulation study consists of complete-data generation from a joint mixed-effects model, imposing missing data under missing completely at random (MCAR) and missing at random (MAR) mechanisms, conducting MI inferences for model parameters and population quantities, and evaluation of the MI inferences with respect to a set of criteria stated below. We focus on MCAR and MAR mechanisms because those mechanisms underlie the models we use. We also briefly mention results from a simulation involving incomplete data under a missing not at random (MNAR) mechanism, without presenting numerical results. 3.1 Study Design 3.1.1 Data Generation Let Y1 and Y2 denote continuous and binary variables, respectively. The joint distribution [Y1, Y2] underlying the data generation mechanism was modeled via the marginal distribution of the continuous component and the conditional distribution of the binary component given the continuous component that is, [Y1, Y2] = [Y1] [Y2|Y1]. Specifically, Y1 was first simulated under a linear model with varying intercepts:   y1ij=1+biY1+ɛij, (12) where i = 1,2,…,50 and j = 1,2,…,10, indexing the clusters and units within a cluster, respectively, and where biY1∼N0,σb12 and ɛij∼N0,σe2. To investigate behavior under varying intracluster correlations (ICC), we allowed σb1 to vary from 0.5 to 2 and σe to vary from 2 to 4. Next, we simulated Y2 conditional on Y1:   logit(Py2ij=1y1ij,biY2))=-1+0.5y1ij+biY2, where i and j are the same indices for clusters and subjects. Similar to (12), this model allowed variation of the intercepts across the clusters, and biY2∼N0,σb22. Similar values for the ICC to those in the continuous model were studied by varying σb2 between 0.4 and 0.9 (for an ICC range between 0.05 and 0.2. We only report results for ICC = 0.05 and ICC = 0.2). Next, we simulated variables that were used in imposing missing values for the MAR mechanism. These variables were also used as auxiliary variables in the imputation models. We simulated the variables Z1|biZ1∼NbiZ1,2 and Z2|biZ2∼NbiZ2,2, where i indexes the clusters, and biZ1and biZ2were distributed independently as N(0, 1). The next step of data generation pertains to imposing missing values under MCAR and MAR mechanisms (Little and Rubin 2002). Let r1 and r2 denote missingness indicators for y1 and y2. Under MCAR, r1 and r2 were simulated from a Bernoulli distribution. We set the success probability to 20 percent, 30 percent, and 40 percent to study the behavior under these rates of missingness. Under MAR, the probability of missingness of y1 and y2 depended on the fully observed auxiliary variables Z1 and Z2:   logit(P(r1=1))=γ0Y1+γ1Y1Z1,  logit(P(r2=1))=γ0Y2+γ1Y2Z2, where the γ values were chosen to lead to approximately 20 percent, 30 percent, and 40 percent rates of missingness. 3.1.2 Estimation and performance We gauge the performance of MI inferences under imputation via SHRIMP, PAN, and MICE with simple hypothetical analyst’s models:   y1ij=α0+α1y2ij+biY1+ɛij, (13)  logit(Py2ij=1y1ij,biY2))=β0+β1y1ij+biY2, (14) where biY1∼N0,τb12, biY2∼N0,τb22, and ɛij∼N0,τe2 are the conventional cluster-specific and residual error terms. (Note: To avoid extraneous notation in the data simulation, analysis, and imputation models, we use identical notation for the random effects and error terms. However, the values of the random effects and error terms can differ across the models.) The parameters in these models are α=α0,α1,β=β0,β1,τb12,τb22and τe2. In addition to the parameters of (13) and (14), we examine the performance of methods in estimating a bivariate summary measure of the data distribution: the probability of observing Y1 > 0 and Y2 = 1. Each of the estimands was also estimated for each simulated data set before deletion of the missing values (BD), and the averages of the BD estimates across the simulations are used as the “true values.” For each simulation scenario, a total of 10 imputations were created under each MI method, and estimates and their standard errors were combined using Rubin’s rules (Rubin 1987). Our choice of 10 imputations was based on considerations of precision, not on the overall computing time. This and other considerations for choosing the number of imputations in general are discussed in Graham, Allison, and Tamika (2007). The specific imputation models for each simulation scenario are given at the bottom of each table of results in the next section and Appendix B. Simulation of data sets, imposing missing values, creating MIs, and estimation based on the multiply imputed data were repeated 1,000 times to create a repeated sampling environment. We gauged the performance of each method using the following criteria, which were computed by averaging across the 1,000 replications: Percent bias (PB): For an estimand T, PB is the magnitude of the raw bias relative to the true value of the estimand. It is calculated by 100×|(E(T^)-T)/T|. A reasonable upper limit for the PB for acceptable performance is 5 percent (Demirtas, Freels, and Yucel 2008). Confidence-interval–based criteria: The coverage rate (CR) is the proportion of confidence intervals that contain the true estimand value. The actual rate should be close to the nominal rate if a procedure is working well. We consider performance to be poor if the actual CR drops below 90 percent for a 95 percent nominal rate (Collins, Schafer, and Kam 2001). We also report average width of confidence intervals (AW). AW is defined as the distance between average lower and upper confidence limits. A high CR along with narrow confidence intervals translates into greater accuracy and higher power. Root mean square error (RMSE): an integrated measure of bias and variance defined by E(T^-T)2. It evaluates an estimator T^ in terms of combined accuracy and precision. 3.2 Summary of the Results Tables 1–4 display true value (TV), average estimate across the simulations (AE), PB, CR, RMSE, and AW for the regression parameters of analyst’s models (13) and (14) and for the bivariate summary measure, the probability of observing Y1 > 0 and Y2 = 1, which is denoted by Q in tables 1–4. We report results for MCAR and MAR mechanisms, sample size of 500 with 50 clusters, and variations in the ICC and missingness rates. Table 1. Summary of the Simulation Results when the Missingness Mechanism Is MCAR, with Missingness Rates of 20% for Each Variable and 36% Jointly ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3300  2.48  93.6  0.1536  0.5247  PAN  0.3384  0.3633  7.38  93.7  0.1541  0.5465  MICE  0.3384  0.3283  2.98  91.1  0.1595  0.5137  α1  SHRIMP  1.6920  1.7090  1.01  94.7  0.2221  0.8288  PAN  1.6920  1.6170  4.43  93.9  0.2242  0.8250  MICE  1.6920  1.7101  1.07  90.5  0.2273  0.8081  β0  SHRIMP  −0.9958  −0.9967  0.08  94.9  0.1648  0.6255  PAN  −0.9958  −0.9523  4.37  94.7  0.1611  0.6377  MICE  −0.9958  −0.9950  0.09  90.1  0.1582  0.6206  β1  SHRIMP  0.4987  0.5031  0.89  94.9  0.0777  0.3012  PAN  0.4987  0.4762  4.50  95.0  0.0744  0.2945  MICE  0.4987  0.5053  1.32  89.4  0.0784  0.2952  Q  SHRIMP  0.3472  0.3480  0.23  94.5  0.0237  0.1001  PAN  0.3472  0.3471  0.04  96.1  0.0226  0.1026  MICE  0.3472  0.3493  0.59  90.4  0.0241  0.0995    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.3069  4.25  92.5  0.3006  1.1420  PAN  −1.2536  −1.1766  6.14  94.0  0.3027  1.2430  MICE  −1.2536  −1.3449  7.28  90.4  0.3141  1.0969  α1  SHRIMP  5.2489  5.3338  1.62  95.0  0.3520  1.4955  PAN  5.2489  5.0159  4.44  95.6  0.3778  1.5390  MICE  5.2489  5.4205  3.27  90.2  0.3812  1.4597  β0  SHRIMP  −1.0091  −0.9715  3.73  94.8  0.1858  0.7352  PAN  −1.0091  −0.9044  10.38  93.7  0.1956  0.7593  MICE  −1.0091  −0.9965  1.25  89.2  0.1871  0.7153  β1  SHRIMP  0.5121  0.4984  2.67  91.8  0.0596  0.2375  PAN  0.5121  0.4629  9.60  87.9  0.0689  0.2324  MICE  0.5121  0.5140  0.37  86.3  0.0627  0.2331  Q  SHRIMP  0.3877  0.3868  0.22  94.7  0.0322  0.1231  PAN  0.3877  0.3820  1.45  94.2  0.0312  0.1249  MICE  0.3877  0.3888  0.29  91.1  0.0321  0.1218  Imputation model for Y1 is Y1 = λ0+λ1*Y2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+bY2.  ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3300  2.48  93.6  0.1536  0.5247  PAN  0.3384  0.3633  7.38  93.7  0.1541  0.5465  MICE  0.3384  0.3283  2.98  91.1  0.1595  0.5137  α1  SHRIMP  1.6920  1.7090  1.01  94.7  0.2221  0.8288  PAN  1.6920  1.6170  4.43  93.9  0.2242  0.8250  MICE  1.6920  1.7101  1.07  90.5  0.2273  0.8081  β0  SHRIMP  −0.9958  −0.9967  0.08  94.9  0.1648  0.6255  PAN  −0.9958  −0.9523  4.37  94.7  0.1611  0.6377  MICE  −0.9958  −0.9950  0.09  90.1  0.1582  0.6206  β1  SHRIMP  0.4987  0.5031  0.89  94.9  0.0777  0.3012  PAN  0.4987  0.4762  4.50  95.0  0.0744  0.2945  MICE  0.4987  0.5053  1.32  89.4  0.0784  0.2952  Q  SHRIMP  0.3472  0.3480  0.23  94.5  0.0237  0.1001  PAN  0.3472  0.3471  0.04  96.1  0.0226  0.1026  MICE  0.3472  0.3493  0.59  90.4  0.0241  0.0995    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.3069  4.25  92.5  0.3006  1.1420  PAN  −1.2536  −1.1766  6.14  94.0  0.3027  1.2430  MICE  −1.2536  −1.3449  7.28  90.4  0.3141  1.0969  α1  SHRIMP  5.2489  5.3338  1.62  95.0  0.3520  1.4955  PAN  5.2489  5.0159  4.44  95.6  0.3778  1.5390  MICE  5.2489  5.4205  3.27  90.2  0.3812  1.4597  β0  SHRIMP  −1.0091  −0.9715  3.73  94.8  0.1858  0.7352  PAN  −1.0091  −0.9044  10.38  93.7  0.1956  0.7593  MICE  −1.0091  −0.9965  1.25  89.2  0.1871  0.7153  β1  SHRIMP  0.5121  0.4984  2.67  91.8  0.0596  0.2375  PAN  0.5121  0.4629  9.60  87.9  0.0689  0.2324  MICE  0.5121  0.5140  0.37  86.3  0.0627  0.2331  Q  SHRIMP  0.3877  0.3868  0.22  94.7  0.0322  0.1231  PAN  0.3877  0.3820  1.45  94.2  0.0312  0.1249  MICE  0.3877  0.3888  0.29  91.1  0.0321  0.1218  Imputation model for Y1 is Y1 = λ0+λ1*Y2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+bY2.  Note.— Total sample size of 500 from 50 clusters. Q = P(Y1 > 0, Y2 = 1). PI, parameter of interest; TV, true value; AE, average estimate; PB, percent bias; CR, coverage rate; RMSE, root mean square error; AW, average width. Table 2. Summary of the Simulation Results when the Missingness Mechanism Is MAR ICC = 0.05, with missingness rate of 20% for each variable and 35% jointly   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3328  1.64  94.1  0.1527  0.5287  PAN  0.3384  0.3568  5.43  93.9  0.1479  0.5524  MICE  0.3384  0.3275  3.21  89.1  0.1505  0.4856  α1  SHRIMP  1.6920  1.6843  0.45  94.5  0.2027  0.8350  PAN  1.6920  1.6019  5.33  91.9  0.2178  0.8396  MICE  1.6920  1.6952  0.19  88.5  0.2320  0.7368  β0  SHRIMP  −0.9958  −0.9851  1.08  95.1  0.1492  0.6254  PAN  −0.9958  −0.9375  5.86  95.0  0.1551  0.6393  MICE  −0.9958  −0.9912  0.46  86.7  0.1643  0.5448  β1  SHRIMP  0.4987  0.4906  1.62  94.6  0.0726  0.3002  PAN  0.4987  0.4672  6.30  92.5  0.0745  0.2954  MICE  0.4987  0.4975  0.23  89.3  0.0817  0.2583  Q  SHRIMP  0.3472  0.3456  0.47  94.8  0.0226  0.1004  PAN  0.3472  0.3454  0.52  94.9  0.0227  0.1021  MICE  0.3472  0.3474  0.05  93.0  0.0252  0.0923    ICC = 0.2, with missingness rates of 22% for Y1, 23% for Y2, and 40% jointly  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2331  1.63  93.9  0.3137  1.1887  PAN  −1.2536  −1.1414  8.95  94.2  0.3374  1.2840  MICE  −1.2536  −1.3146  4.86  88.9  0.2998  1.0733  α1  SHRIMP  5.2489  5.2573  0.16  95.1  0.3254  1.5066  PAN  5.2489  4.9972  4.80  94.4  0.4068  1.5670  MICE  5.2489  5.3753  2.41  90.1  0.3655  1.4154  β0  SHRIMP  −1.0091  −0.9801  2.87  92.9  0.1970  0.7491  PAN  −1.0091  −0.9107  9.75  93.1  0.2074  0.7734  MICE  −1.0091  −0.9966  1.24  85.9  0.2031  0.6545  β1  SHRIMP  0.5121  0.4881  4.69  93.5  0.0587  0.2305  PAN  0.5121  0.4560  10.95  88.7  0.0735  0.2278  MICE  0.5121  0.5059  1.20  86.7  0.0631  0.2066  Q  SHRIMP  0.3877  0.3852  0.64  94.3  0.0349  0.1245  PAN  0.3877  0.3813  1.64  93.4  0.0343  0.1257  MICE  0.3877  0.3866  0.26  87.1  0.0367  0.1185  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  ICC = 0.05, with missingness rate of 20% for each variable and 35% jointly   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3328  1.64  94.1  0.1527  0.5287  PAN  0.3384  0.3568  5.43  93.9  0.1479  0.5524  MICE  0.3384  0.3275  3.21  89.1  0.1505  0.4856  α1  SHRIMP  1.6920  1.6843  0.45  94.5  0.2027  0.8350  PAN  1.6920  1.6019  5.33  91.9  0.2178  0.8396  MICE  1.6920  1.6952  0.19  88.5  0.2320  0.7368  β0  SHRIMP  −0.9958  −0.9851  1.08  95.1  0.1492  0.6254  PAN  −0.9958  −0.9375  5.86  95.0  0.1551  0.6393  MICE  −0.9958  −0.9912  0.46  86.7  0.1643  0.5448  β1  SHRIMP  0.4987  0.4906  1.62  94.6  0.0726  0.3002  PAN  0.4987  0.4672  6.30  92.5  0.0745  0.2954  MICE  0.4987  0.4975  0.23  89.3  0.0817  0.2583  Q  SHRIMP  0.3472  0.3456  0.47  94.8  0.0226  0.1004  PAN  0.3472  0.3454  0.52  94.9  0.0227  0.1021  MICE  0.3472  0.3474  0.05  93.0  0.0252  0.0923    ICC = 0.2, with missingness rates of 22% for Y1, 23% for Y2, and 40% jointly  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2331  1.63  93.9  0.3137  1.1887  PAN  −1.2536  −1.1414  8.95  94.2  0.3374  1.2840  MICE  −1.2536  −1.3146  4.86  88.9  0.2998  1.0733  α1  SHRIMP  5.2489  5.2573  0.16  95.1  0.3254  1.5066  PAN  5.2489  4.9972  4.80  94.4  0.4068  1.5670  MICE  5.2489  5.3753  2.41  90.1  0.3655  1.4154  β0  SHRIMP  −1.0091  −0.9801  2.87  92.9  0.1970  0.7491  PAN  −1.0091  −0.9107  9.75  93.1  0.2074  0.7734  MICE  −1.0091  −0.9966  1.24  85.9  0.2031  0.6545  β1  SHRIMP  0.5121  0.4881  4.69  93.5  0.0587  0.2305  PAN  0.5121  0.4560  10.95  88.7  0.0735  0.2278  MICE  0.5121  0.5059  1.20  86.7  0.0631  0.2066  Q  SHRIMP  0.3877  0.3852  0.64  94.3  0.0349  0.1245  PAN  0.3877  0.3813  1.64  93.4  0.0343  0.1257  MICE  0.3877  0.3866  0.26  87.1  0.0367  0.1185  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  Note.— Total sample size of 500 from 50 clusters. Q = P(Y1 > 0, Y2 = 1). PI, parameter of interest; TV, true value; AE, average estimate; PB, percent bias; CR, coverage rate; RMSE, root mean square error; AE, average width. Table 3. Summary of the Simulation Results when the Missingness Mechanism Is MAR, with Missingness Rates of Approximately 30% for Each Variable and 50% Jointly ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3476  2.72  93.7  0.1609  0.5901  PAN  0.3384  0.3859  14.03  93.4  0.1571  0.6468  MICE  0.3384  0.3485  3.00  89.3  0.1639  0.5118  α1  SHRIMP  1.6920  1.6646  1.62  94.4  0.2526  0.9161  PAN  1.6920  1.5582  7.91  94.6  0.2618  0.9486  MICE  1.6920  1.6698  1.31  87.6  0.2618  0.7901  β0  SHRIMP  −0.9958  −0.9898  0.60  94.7  0.1809  0.6991  PAN  −0.9958  −0.9288  6.74  97.1  0.1673  0.7451  MICE  −0.9958  −0.9941  0.18  87.1  0.1798  0.5731  β1  SHRIMP  0.4987  0.4859  2.55  94.6  0.0840  0.3316  PAN  0.4987  0.4570  8.36  94.1  0.0842  0.3380  MICE  0.4987  0.4924  1.26  84.4  0.0903  0.2756  Q  SHRIMP  0.3472  0.3436  1.03  94.1  0.0297  0.1086  PAN  0.3472  0.3448  0.70  94.0  0.0279  0.1128  MICE  0.3472  0.3451  0.61  90.1  0.0290  0.0954    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2374  1.29  93.9  0.4122  1.3090  PAN  −1.2536  −1.1097  11.48  91.0  0.4267  1.4705  MICE  −1.2536  −1.3114  4.61  82.5  0.3588  1.0912  α1  SHRIMP  5.2489  5.2743  0.48  94.9  0.3936  1.6645  PAN  5.2489  4.8872  6.89  90.1  0.5117  1.7465  MICE  5.2489  5.4462  3.76  88.3  0.4693  1.5162  β0  SHRIMP  −1.0091  −0.9625  4.62  94.3  0.2295  0.8366  PAN  −1.0091  −0.8504  15.72  91.1  0.2607  0.8872  MICE  −1.0091  −1.0089  0.02  84.6  0.2536  0.6945  β1  SHRIMP  0.5121  0.4783  6.60  93.4  0.0790  0.2550  PAN  0.5121  0.4289  16.25  87.0  0.1002  0.2458  MICE  0.5121  0.5056  1.26  83.2  0.0865  0.2305  Q  SHRIMP  0.3877  0.3843  0.86  93.7  0.0362  0.1287  PAN  0.3877  0.3779  2.52  90.6  0.0364  0.1317  MICE  0.3877  0.3888  0.30  83.1  0.0371  0.1172  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3476  2.72  93.7  0.1609  0.5901  PAN  0.3384  0.3859  14.03  93.4  0.1571  0.6468  MICE  0.3384  0.3485  3.00  89.3  0.1639  0.5118  α1  SHRIMP  1.6920  1.6646  1.62  94.4  0.2526  0.9161  PAN  1.6920  1.5582  7.91  94.6  0.2618  0.9486  MICE  1.6920  1.6698  1.31  87.6  0.2618  0.7901  β0  SHRIMP  −0.9958  −0.9898  0.60  94.7  0.1809  0.6991  PAN  −0.9958  −0.9288  6.74  97.1  0.1673  0.7451  MICE  −0.9958  −0.9941  0.18  87.1  0.1798  0.5731  β1  SHRIMP  0.4987  0.4859  2.55  94.6  0.0840  0.3316  PAN  0.4987  0.4570  8.36  94.1  0.0842  0.3380  MICE  0.4987  0.4924  1.26  84.4  0.0903  0.2756  Q  SHRIMP  0.3472  0.3436  1.03  94.1  0.0297  0.1086  PAN  0.3472  0.3448  0.70  94.0  0.0279  0.1128  MICE  0.3472  0.3451  0.61  90.1  0.0290  0.0954    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2374  1.29  93.9  0.4122  1.3090  PAN  −1.2536  −1.1097  11.48  91.0  0.4267  1.4705  MICE  −1.2536  −1.3114  4.61  82.5  0.3588  1.0912  α1  SHRIMP  5.2489  5.2743  0.48  94.9  0.3936  1.6645  PAN  5.2489  4.8872  6.89  90.1  0.5117  1.7465  MICE  5.2489  5.4462  3.76  88.3  0.4693  1.5162  β0  SHRIMP  −1.0091  −0.9625  4.62  94.3  0.2295  0.8366  PAN  −1.0091  −0.8504  15.72  91.1  0.2607  0.8872  MICE  −1.0091  −1.0089  0.02  84.6  0.2536  0.6945  β1  SHRIMP  0.5121  0.4783  6.60  93.4  0.0790  0.2550  PAN  0.5121  0.4289  16.25  87.0  0.1002  0.2458  MICE  0.5121  0.5056  1.26  83.2  0.0865  0.2305  Q  SHRIMP  0.3877  0.3843  0.86  93.7  0.0362  0.1287  PAN  0.3877  0.3779  2.52  90.6  0.0364  0.1317  MICE  0.3877  0.3888  0.30  83.1  0.0371  0.1172  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  Note.— Total sample size of 500 from 50 clusters. Q = P(Y1 > 0, Y2 = 1). PI, parameter of interest; TV, true value; AE, average estimate; PB, percent bias; CR, coverage rate; RMSE, root mean square error; AW, average width. Table 4. Summary of the Simulation Results when the Missingness Mechanism Is MAR, with Missingness Rates of Approximately 38% for Each Variable and 60% Jointly ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3612  6.75  93.9  0.2039  0.6883  PAN  0.3384  0.4218  24.66  94.1  0.2043  0.7910  MICE  0.3384  0.3666  8.33  83.2  0.1811  0.5528  α1  SHRIMP  1.6920  1.6474  2.64  94.6  0.2504  1.0763  PAN  1.6920  1.5039  11.11  93.7  0.2925  1.1008  MICE  1.6920  1.6304  3.64  88.7  0.2780  0.9223  β0  SHRIMP  −0.9958  −0.9921  0.37  91.7  0.2354  0.8121  PAN  −0.9958  −0.9240  7.22  91.6  0.2089  0.9089  MICE  −0.9958  −0.9825  1.34  87.8  0.2160  0.6484  β1  SHRIMP  0.4987  0.4825  3.25  95.0  0.1000  0.3882  PAN  0.4987  0.4411  11.55  92.5  0.1005  0.3936  MICE  0.4987  0.4803  3.68  86.3  0.1079  0.3281  Q  SHRIMP  0.3472  0.3424  1.38  92.3  0.0329  0.1174  PAN  0.3472  0.3418  1.57  93.9  0.0287  0.1278  MICE  0.3472  0.3422  1.44  90.2  0.0283  0.1018    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2760  1.79  94.7  0.4013  1.4241  PAN  −1.2536  −1.0847  13.48  93.9  0.4366  1.6360  MICE  −1.2536  −1.3336  6.38  82.1  0.3687  1.1620  α1  SHRIMP  5.2489  5.2610  0.23  95.0  0.4336  1.8882  PAN  5.2489  4.7939  8.67  91.4  0.6115  2.0009  MICE  5.2489  5.3678  2.26  86.5  0.5192  1.7143  β0  SHRIMP  −1.0091  −0.9504  5.81  92.9  0.2838  0.9370  PAN  −1.0091  −0.8349  17.26  91.3  0.2779  1.0414  MICE  −1.0091  −0.9527  5.59  82.6  0.2736  0.7518  β1  SHRIMP  0.5121  0.4701  8.20  91.9  0.0817  0.2882  PAN  0.5121  0.4121  19.51  67.9  0.1117  0.2740  MICE  0.5121  0.4778  6.69  83.1  0.0889  0.2638  Q  SHRIMP  0.3877  0.3800  1.99  91.8  0.0394  0.1357  PAN  0.3877  0.3723  3.96  91.1  0.0382  0.1398  MICE  0.3877  0.3826  1.30  82.3  0.0393  0.1175  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3612  6.75  93.9  0.2039  0.6883  PAN  0.3384  0.4218  24.66  94.1  0.2043  0.7910  MICE  0.3384  0.3666  8.33  83.2  0.1811  0.5528  α1  SHRIMP  1.6920  1.6474  2.64  94.6  0.2504  1.0763  PAN  1.6920  1.5039  11.11  93.7  0.2925  1.1008  MICE  1.6920  1.6304  3.64  88.7  0.2780  0.9223  β0  SHRIMP  −0.9958  −0.9921  0.37  91.7  0.2354  0.8121  PAN  −0.9958  −0.9240  7.22  91.6  0.2089  0.9089  MICE  −0.9958  −0.9825  1.34  87.8  0.2160  0.6484  β1  SHRIMP  0.4987  0.4825  3.25  95.0  0.1000  0.3882  PAN  0.4987  0.4411  11.55  92.5  0.1005  0.3936  MICE  0.4987  0.4803  3.68  86.3  0.1079  0.3281  Q  SHRIMP  0.3472  0.3424  1.38  92.3  0.0329  0.1174  PAN  0.3472  0.3418  1.57  93.9  0.0287  0.1278  MICE  0.3472  0.3422  1.44  90.2  0.0283  0.1018    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2760  1.79  94.7  0.4013  1.4241  PAN  −1.2536  −1.0847  13.48  93.9  0.4366  1.6360  MICE  −1.2536  −1.3336  6.38  82.1  0.3687  1.1620  α1  SHRIMP  5.2489  5.2610  0.23  95.0  0.4336  1.8882  PAN  5.2489  4.7939  8.67  91.4  0.6115  2.0009  MICE  5.2489  5.3678  2.26  86.5  0.5192  1.7143  β0  SHRIMP  −1.0091  −0.9504  5.81  92.9  0.2838  0.9370  PAN  −1.0091  −0.8349  17.26  91.3  0.2779  1.0414  MICE  −1.0091  −0.9527  5.59  82.6  0.2736  0.7518  β1  SHRIMP  0.5121  0.4701  8.20  91.9  0.0817  0.2882  PAN  0.5121  0.4121  19.51  67.9  0.1117  0.2740  MICE  0.5121  0.4778  6.69  83.1  0.0889  0.2638  Q  SHRIMP  0.3877  0.3800  1.99  91.8  0.0394  0.1357  PAN  0.3877  0.3723  3.96  91.1  0.0382  0.1398  MICE  0.3877  0.3826  1.30  82.3  0.0393  0.1175  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  Note.— Total sample size of 500 from 50 clusters. Q = P(Y1 > 0, Y2 = 1). PI, parameter of interest; TV, true value; AE, average estimate; PB, percent bias; CR, coverage rate; RMSE, root mean square error; AW, average width. Tables 1 and 2 present results for MCAR and MAR, respectively. The rates of missingness for Y1 and Y2 were set to approximately 20 percent marginally each and 36 percent jointly. PBs and CRs that are beyond acceptable limits are shown in bold. Generally, SHRIMP outperforms PAN and MICE with respect to PB, CR, and the other criteria. SHRIMP has no bolded entries for PB and CR, whereas PAN has several bolded entries for PB and one for CR, and MICE has several for CR. Biases such as those associated with PAN are typical for methods with underlying imputation models that assume Gaussian structure for binary variables whose observed distributions have asymmetry (Yucel, He, and Zaslavsky 2008). Although MICE leads to estimates with low bias, its lack of accounting for part of the cluster-level variation leads to inferences that are artificially too precise, in other words, narrower CIs and hence lower-than-nominal CRs. More often than not, SHRIMP has the lowest RMSE of the three methods, particularly when ICC = 0.2. SHRIMP also usually has a lower AW than does PAN. (We do not compare SHRIMP and MICE with respect to AW because of MICE’s lower than nominal CRs.) The three methods have the most similar performance under the MCAR mechanism with ICC = 0.05. The better performance of SHRIMP becomes more apparent when the missing data are MAR (which is more realistic than MCAR in most multivariate settings), as well as when the ICC increases. Tables 3 and 4 display results under the MAR mechanism with higher rates of missingness (up to 60 percent). As might be expected, the performances of all of the methods degrade to some extent. For example, SHRIMP and MICE now have a few bolded entries for PB, PAN’s PBs tend to increase, and MICE’s CRs tend to decrease. Nevertheless, the general trends described above for Tables 1 and 2 persist. Moreover, the better performance of SHRIMP, particularly relative to PAN regarding PB and MICE regarding CR, become more pronounced. Although not shown here, we observed quite dramatic changes in performance of the methods when the missingness mechanism was MNAR. SHRIMP performed the best for most of the estimands and was only outperformed by PAN with respect to inference for the intercept parameters. However, none of the methods performed particularly well under MNAR, as would be expected given that the methods assumed MAR. In a limited simulation study, we also compared the performance of our methods with that of the recently-implemented R package jomo (Quartagno and Carpenter 2016), which was described in Section 1. For settings with ICC = 0.2, MAR, and around 50 percent joint missingness, a summary of the results is provided in Appendix B. Overall, jomo and SHRIMP performed similarly with respect to the criteria given in section 3.1.2, although jomo had a slightly lower coverage rate, along with higher bias and RMSE, for β0. When joint modeling is possible, jomo presents a viable software tool for conducting MI inference. We note several points. The first is the role of the priors, which is negligible in most settings. We compared the inverse chi-square prior and the half-t family prior on the variances of the random-effect terms (Gelman 2006). The results (not shown here) indicate no discernible difference due to selection of priors. The second pertains to the goodness of fit of the imputation models. We assessed the fit of the imputation models against the complete simulated data, and the models fit the data well. The third point relates to improving the coherence of the variable-by-variable procedures with the underlying joint predictive distribution of missing data. To investigate the sensitivity to incoherence with regard to the random effects, we conducted additional computations with procedures that augmented each conditional imputation model with the predicted random effects from the other conditional models in each cycle to improve the overall consistency with the joint modeling approach. The following are the augmented imputation models in imputation cycle c:   y1ij=λ0+λ1y2ij+λ2z1ij+λ3z2ij+λ4biY2c-1+biY1+ɛij,  logit(Py2ij=1y1ij,biY2))=ξ0+ξ1y1ij+ξ2z1ij+ξ3z2ij+ξ3biY1(c)+biY2, where c - 1 and c in parentheses refer to the estimate’s cycle number. Our simulation-based investigation showed that the improvements due to using the augmented imputation model were not substantial under the MCAR and MAR scenarios, although there were more noticeable improvements for most of the estimands under the MNAR scenario, especially for the intercept of the linear model. Nevertheless, the performances of all of the methods under MNAR were subpar. 4. DISCUSSION Our primary goal in the simulation study was to assess the performance of the SHRIMP method in missing-data problems where the data structure involves correlated observational units. Typical application areas would be surveys with diverse sets of questionnaire items with possible skip patterns and where data collection is pursued using a longitudinal design or a multistage sampling scheme. We did not consider the compatibility of our conditional imputation models, but rather conducted a simulation study involving relatively simple scenarios that not only allowed us to evaluate SHRIMP but to compare it to several alternative approaches. In the scenarios considered, with moderate rates of missingness and reasonably rich imputation models consistent with the data structure, SHRIMP tended to outperform the alternatives, with higher accuracy and more efficiency. This finding is significant because in many surveys, practitioners face complexities such as skip patterns/censoring and variables measured on different scales, that are targeted by sequential approaches to imputation. Although our simulation scenarios were limited, we believe that our study provides a reasonable initial reference point for more complex situations. In all applications of imputation, users should be careful and diligent in building their imputation models. A general strategy is to include variables that would be good predictors of the missing values, that will be used in the postimputation phase, and that could explain missingness to reduce potential biases due to nonresponse. Finally, imputation models should reflect design information and variables. In addition to careful consideration of variables, it is also important to carefully build imputation models with respect to their functional forms as well as interaction among the variables and other potential nonlinearities. For more information on building imputation models, see Rubin (1987, 1996), Reiter, Raghunathan, and Kinney (2006), and Bondarenko and Raghunathan (2016). The goal of this paper was not to derive a computationally efficient algorithm, but rather to demonstrate the feasibility of the SHRIMP method. Since SHRIMP is computationally burdensome, one future topic would be to develop computationally efficient algorithms and fast ML estimation under generalized linear mixed-effects models in large data problems and in problems with richer random effects. We focused on random-intercept-only imputation models. This was motivated largely by computational feasibility (approximations quickly become computationally expensive as the number of random effects increases) and by the view that random intercepts could be sufficient to capture cluster-specific variation. This sufficiency might be realized more in settings with rich sets of auxiliary variables available as covariates. However, in certain conditions as described by Lohr (2014), misspecification of the random-effects structure could lead to understated design effects. Our current work focuses on several important extensions of SHRIMP. One extension is to incorporate imputation models for nominal, count, and semicontinuous variables to further handle variables measured on different scales. We also plan to extend our algorithms so that higher levels of nesting are allowed (e.g., longitudinal assessments of students nested within schools). These extensions are important because missing values can take quite arbitrary patterns at potentially any levels of observational units (e.g., school-level characteristics might be incomplete). A final extension is to make the methods flexible enough to handle unique data features that are commonly seen in survey practices, such as bounds and restrictions, in addition to multiple types of missing data, such as when participants may either answer “I don’t know” to a sensitive item or simply refuse to answer. APPENDIX A: DETAILED ALGORITHM FOR LOGISTIC MIXED-EFFECT REGRESSION MODEL From the assumption of independence of the bi, i = 1,…,m, it follows that the marginal likelihood (11) can be expressed as ∏i=1m ∫biLi(θ|yi, xi, wi, bi) fi(bi) dbi, where Li(θ|yi, xi, wi, bi) is the factor in likelihood (10) corresponding to cluster i; yi, xi, and wi denote the data values in cluster i, and fi(bi) is the N(0, σb2) density function. Thus, Gauss-Hermitian quadrature approximates the marginal likelihood by   L(θ|y, x, w)≈∏i=1m∑h = 1HLi(θ|yi, xi, wi, bi=Bh)W(Bh), (A.1) where Bh and W(Bh) are quadrature points and weights, respectively, and H is the number of quadrature points. From (1) to (3), (10), (11), and (A.1), it follows that the marginal log-likelihood is approximated as   lθ|y, x, w≈∑i=1m∑h=1H∑j=1niWBhyijlog⁡F(ηihj)1-Fηihj+log⁡(1-Fηihj), (A.2)  whereηihj= xijβ+Bh. The Fisher scoring method is employed to find the estimates that maximize the approximate marginal log-likelihood (A.2). A normal distribution N(θ,^C-1) is then built, where θ^ is the maximum-likelihood estimate and C is the information matrix at convergence, and a Metropolis-Hastings algorithm is employed to perform the sampling from this proposal distribution. The samples are drawn with the acceptance probability of the minimum value of the posterior ratio and 1. For each imputation of the missing values, a draw of the regression parameters θ=(β,σb) is obtained after the sampler chain reaches its convergence. Then for each cluster i, the random effect bi is drawn from its posterior distribution N(0,σb2). Finally, each yij∈ymis is drawn from the following Bernoulli distribution:   yij|x,w,β,σb,b1,…,bm∼BernoulliGxijβ+bi,  whereGη=logit-1η. APPENDIX B: ADDITIONAL SIMULATION RESULTS INCLUDING COMPARISONS WITH JOMO Table B.1.Summary of the Simulation Results when the Missingness Mechanism Is MAR, with Missingness Rates of Approximately 30% for Each Variable and 50% Jointly, and with ICC = 0.2 PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  −1.2536  −1.2374  1.29  93.9  0.4122  1.3090  PAN  −1.2536  −1.1097  11.48  91.0  0.4267  1.4705  MICE  −1.2536  −1.3114  4.61  82.5  0.3588  1.0912    jomo  −1.2536  −1.2734  1.57  93.4  0.4204  1.3101  α1  SHRIMP  5.2489  5.2743  0.48  94.9  0.3936  1.6645  PAN  5.2489  4.8872  6.89  90.1  0.5117  1.7465  MICE  5.2489  5.4462  3.76  88.3  0.4693  1.5162    jomo  5.2489  5.2898  0.78  95.1  0.3978  1.6534  β0  SHRIMP  −1.0091  −0.9625  4.62  94.3  0.2295  0.8366  PAN  −1.0091  −0.8504  15.72  91.1  0.2607  0.8872  MICE  −1.0091  −1.0089  0.02  84.6  0.2536  0.6945    jomo  -1.0091  0.9301  7.82  86.6  0.2399  0.7245  β1  SHRIMP  0.5121  0.4783  6.60  93.4  0.0790  0.2550  PAN  0.5121  0.4289  16.25  87.0  0.1002  0.2458  MICE  0.5121  0.5056  1.26  83.2  0.0865  0.2305    jomo  0.5121  0.4899  4.33  94.9  0.071  0.2281  Q  SHRIMP  0.3877  0.3843  0.86  93.7  0.0362  0.1287  PAN  0.3877  0.3779  2.52  90.6  0.0364  0.1317  MICE  0.3877  0.3888  0.30  83.1  0.0371  0.1172    jomo  0.3877  0.3835  1.08  92.9  0.0333  0.1221  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  −1.2536  −1.2374  1.29  93.9  0.4122  1.3090  PAN  −1.2536  −1.1097  11.48  91.0  0.4267  1.4705  MICE  −1.2536  −1.3114  4.61  82.5  0.3588  1.0912    jomo  −1.2536  −1.2734  1.57  93.4  0.4204  1.3101  α1  SHRIMP  5.2489  5.2743  0.48  94.9  0.3936  1.6645  PAN  5.2489  4.8872  6.89  90.1  0.5117  1.7465  MICE  5.2489  5.4462  3.76  88.3  0.4693  1.5162    jomo  5.2489  5.2898  0.78  95.1  0.3978  1.6534  β0  SHRIMP  −1.0091  −0.9625  4.62  94.3  0.2295  0.8366  PAN  −1.0091  −0.8504  15.72  91.1  0.2607  0.8872  MICE  −1.0091  −1.0089  0.02  84.6  0.2536  0.6945    jomo  -1.0091  0.9301  7.82  86.6  0.2399  0.7245  β1  SHRIMP  0.5121  0.4783  6.60  93.4  0.0790  0.2550  PAN  0.5121  0.4289  16.25  87.0  0.1002  0.2458  MICE  0.5121  0.5056  1.26  83.2  0.0865  0.2305    jomo  0.5121  0.4899  4.33  94.9  0.071  0.2281  Q  SHRIMP  0.3877  0.3843  0.86  93.7  0.0362  0.1287  PAN  0.3877  0.3779  2.52  90.6  0.0364  0.1317  MICE  0.3877  0.3888  0.30  83.1  0.0371  0.1172    jomo  0.3877  0.3835  1.08  92.9  0.0333  0.1221  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  Note.— Total sample size of 500 from 50 clusters. Q = P(Y1 > 0, Y2 = 1). PI, parameter of interest; TV, true value; AE, average estimate; PB, percent bias; CR, coverage rate; RMSE, root mean square error; AW, average width. References Abramowitz M., Stegun I. A., eds. ( 1972), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables  ( 9th printing), New York: Dover. Albert J. ( 2007), Bayesian Computation with R , New York: Springer. Google Scholar CrossRef Search ADS   Andridge R. R. ( 2011), “ Quantifying the Impact of Fixed Effects Modeling of Clusters in Multiple Imputation for Cluster Randomized Trials,” Biometrical Journal , 53, 57– 74. Google Scholar CrossRef Search ADS PubMed  Bondarenko I., Raghunathan T. ( 2016), “ Graphical and Numerical Diagnostic Tools to Assess Suitability of Multiple Imputations and Imputation Models,” Statistics in Medicine , 35, 3007– 3020. Google Scholar CrossRef Search ADS PubMed  Carpenter J. R., Goldstein H., Kenward M. G. ( 2011), “ REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types,” Journal of Statistical Software , 45, 1– 14. Google Scholar CrossRef Search ADS   Collins L. M., Schafer L., Kam C. H. ( 2001), “ A Comparison of Inclusive and Restrictive Strategies in Modern Missing Data Procedures,” Psychological Methods , 6, 330– 351. Google Scholar CrossRef Search ADS PubMed  Demirtas H., Freels S. A., Yucel R. M. ( 2008), “ The Plausibility of Multivariate Normality Assumption when Multiply Imputing Non-Gaussian Continuous Outcomes: A Simulation Assessment,” Journal of Statistical Computation and Simulation , 78, 69– 84. Google Scholar CrossRef Search ADS   Gelman A. ( 2006), “ Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper),” Bayesian Analysis , 1, 515– 534. Google Scholar CrossRef Search ADS   Graham J. W., Allison O., Tamika D. G. ( 2007), “ How Many Imputations Are Really Needed? Some Practical Clarifications of Multiple Imputation Theory,” Prevention Science , 8, 206– 213. Google Scholar CrossRef Search ADS PubMed  Hedeker D., Gibbons R. ( 1994), “ A Random-Effects Ordinal Regression Model for Multilevel Analysis,” Biometrics , 50, 933– 944. Google Scholar CrossRef Search ADS PubMed  Kropko J., Goodrich B., Gelman A., Hill J. ( 2014), “ Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal and Conditional Approaches,” Political Analysis , 22, 497– 519. Google Scholar CrossRef Search ADS   Jennrich R. I., Schluchter M. D. ( 1986), “ Unbalanced Repeated-Measures Models with Structured Covariance Matrices,” Biometrics , 42, 805– 820. Google Scholar CrossRef Search ADS PubMed  Laird N. M., Ware J. H. ( 1982), “ Random-Effects Models for Longitudinal Data,” Biometrics , 38, 963– 974. Google Scholar CrossRef Search ADS PubMed  Lee K. J., Carlin J. B. ( 2010), “ Multiple Imputation for Missing Data: Fully Conditional Specification versus Multivariate Normal Imputation,” American Journal of Epidemiology , 171, 624– 632. Google Scholar CrossRef Search ADS PubMed  Little R., Rubin D. B. ( 2002), Statistical Analysis with Missing Data  ( 2nd ed.), New York: John Wiley & Sons. Google Scholar CrossRef Search ADS   Liu J., Gelman A., Hill J., Su Y. S., Kropko J. ( 2014), “ On the Stationary Distribution of Iterative Imputations,” Biometrika , 101 1, 155– 173. Google Scholar CrossRef Search ADS   Lohr S. L. ( 2014), “ Design Effects for a Regression Slope in a Cluster Sample,” Journal of Survey Statistics and Methodology , 2, 97– 125. Google Scholar CrossRef Search ADS   Mccoloch C. E., Searle S. R. ( 2001), Generalized, Linear, and Mixed Models , New York: John Wiley & Sons. Meng X. L. ( 1994), “Multiple-Imputation Inference with Uncongenial Sources of Input” (with discussion), Statistical Science , 9, 538– 573. Google Scholar CrossRef Search ADS   Olsen M., Schafer J. L. ( 2001), “ A Two-Part Random-Effects Model for Semicontinuous Longitudinal Data,” Journal of the American Statistical Association , 96, 730– 745. Google Scholar CrossRef Search ADS   Quartagno M., Carpenter J. ( 2016), jomo: A Package for Multilevel Joint Modelling Multiple Imputation, Available at http://cran.r-project.org/package=jomo. Raghunathan T. E., Lepkowski M., Van Hoewyk J., Solenberger P. ( 2001), “ A Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models,” Survey Methodology , 27, 85– 95. Raudenbush S., Bryke A. ( 2002), Hierarchical Linear Models: Applications and Data Analysis Methods  ( 2nd ed.), Thousand Oaks, CA: Sage. Raudenbush S., Yang M., Yosef M. ( 2000), “ Maximum Likelihood for Generalized Linear Models with Nested Random Effects via High-Order, Multivariate Laplace Approximation,” Journal of Computational and Graphical Statistics , 9, 141– 157. Resche-Rigon M., White I. R. ( 2016), “ Multiple Imputation by Chained Equations for Systematically and Sporadically Missing Multilevel Data,” Statistical Methods in Medical Research . Reiter J. P., Raghunathan T. E. ( 2007), “ The Multiple Adaptations of Multiple Imputation,” Journal of the American Statistical Association , 102, 1462– 1471. Google Scholar CrossRef Search ADS   Reiter J. P., Raghunathan E., Kinney S. K. ( 2006), “ Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data,” Survey Methodology , 32, 143– 150. Royston P., White I. R. ( 2011), “ Multiple Imputation by Chained Equations (MICE): Implementation in Stata,” Journal of Statistical Software , 45, 1– 20. Google Scholar CrossRef Search ADS   Rubin D. B. ( 1976), “ Inference and Missing Data” (with discussion), Biometrika , 63, 581– 592. Google Scholar CrossRef Search ADS   Rubin D. B. ( 1987), Multiple Imputation for Nonresponse in Surveys , New York: John Wiley and Sons. Google Scholar CrossRef Search ADS   Rubin D. B. ( 1996), “ Multiple Imputation After 18+ Years (with Discussion),” Journal of the American Statistical Association , 91, 473– 489. Google Scholar CrossRef Search ADS   Rubin D. B., Schenker N. ( 1986), “ Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse,” Journal of the American Statistical Association , 81, 366– 374. Google Scholar CrossRef Search ADS   Schafer J. L. ( 1997), Analysis of Incomplete Data , London: Chapman & Hall. Schafer J. L. ( 1999), “ Multiple Imputation: A Primer,” Statistical Methods in Medical Research , 8, 3– 15. Google Scholar CrossRef Search ADS PubMed  Schafer J. L., Yucel R. M. ( 2002), “ Computational Strategies for Multivariate Linear Mixed-Effects Models with Missing Values,” Journal of Computational and Graphical Statistics , 11, 421– 442. Google Scholar CrossRef Search ADS   Van Buuren S. ( 2007), “ Multiple Imputation of Discrete and Continuous Data by Fully Conditional Specification,” Statistical Methods in Medical Research , 16, 219– 242. Google Scholar CrossRef Search ADS PubMed  Van Buuren S. ( 2012), Flexible Imputation of Missing Data , London, CRC Press. Google Scholar CrossRef Search ADS   Van Buuren S., Brand J.P., Groothuis-Oudshoorn C G.M., Rubin D.B. ( 2006), Fully conditional specification in multivariate imputation. Journal of statistical computation and simulation , 76 12, 1049– 1064. Google Scholar CrossRef Search ADS   Van Buuren S., Groothuis-Oudshoorn C. ( 2011), “ MICE: Multivariate Imputation by Chained Equations in R,” Journal of Statistical Software , 45 3, 1– 67. Yucel R.M., He Y., Zaslavsky A.M. ( 2008), Using calibration to improve rounding in imputation. The American Statistician , 62 2, pp. 125– 129. Google Scholar CrossRef Search ADS   Yucel R.M., He Y., Zaslavsky A.M. ( 2011), Gaussian–based routines to impute categorical variables in health surveys. Statistics in medicine , 30 29, 3447– 3460. Google Scholar CrossRef Search ADS PubMed  © The Author 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Survey Statistics and Methodology Oxford University Press

Loading next page...
 
/lp/ou_press/sequential-hierarchical-regression-imputation-A0DqYwLBJ8
Publisher
Oxford University Press
Copyright
© The Author 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
2325-0984
eISSN
2325-0992
D.O.I.
10.1093/jssam/smx004
Publisher site
See Article on Publisher Site

Abstract

Abstract Skip patterns, bounds, and diverse measurement scales often exacerbate the problem of item nonresponse in the analysis of survey data. Sequential, or variable-by-variable imputation techniques have been quite successfully applied to overcome such problems. Most of these techniques have so far focused on relatively simple designs, and studies have demonstrated the consistency of these methods with techniques that draw from a joint posterior predictive distribution of missing data. Here we consider a sequential imputation technique based on a family of hierarchical regression models, extending the sequential approach to correlated data, (e.g., clustered data) and assess its performance. Each of the regression models is tailored to the variable being handled. Computational techniques used to approximate the posterior predictive distributions are based on Markov Chain Monte Carlo (MCMC) and numerical integration to overcome the problem of intractability. We present a simulation study assessing the compatibility of this approach with the joint data generation mechanism. In the scenarios studied, the sequential method leads to well-calibrated estimates and often performs better than methods that are currently available to practitioners. 1. INTRODUCTION Missing data is one of the most pervasive problems in data analyses. Since the seminal work by Rubin (1987), survey analysts and methodologists have been practicing multiple imputation (MI), a simulation-based approach to facilitating inference in data sets with incomplete values. Under the MI paradigm, the posterior predictive distribution of the missing data is simulated to replace the missing data with pseudo-random values, typically I > 1 times. Each of the I imputed data sets is analyzed in the same fashion by a complete-data method, and the results are then combined by following Rubin’s rules (Rubin 1987). The process of creating imputations, analyzing the imputed data sets, and combining the results is a Monte Carlo version of averaging the statistical results over the predictive distribution of the missing data. MI can be highly efficient even for small I, and one good set of imputations can be used for many different analytic problems. Under model-based MI, one must impose a probability model on the complete data (observed and missing values), along with a model or assumptions about the mechanism causing missing data. The models are then used to form the posterior predictive distribution of the missing data, from which the imputations are sampled. In general, there are two approaches to forming an imputation model: joint modeling (Schafer 1997; Little and Rubin 2002) and sequential modeling (van Buuren and Groothuis-Oudshoorn 2011, Raghunathan, Lepkowski, VanHoewyk, and Solenberger 2001; and others as referenced below). Joint imputation modeling can be preferable if the problem consists of a relatively small number of items, and it often uses Markov Chain Monte Carlo (MCMC)–type algorithms for drawing missing values from their “joint” posterior predictive distribution. The analytical complexity of the problem is often heightened in cases where missingness occurs for many variables with diverse measurement scales. Suitable computational tools allowing researchers to conduct MI have centered on the notion of sequential, or variable-by-variable imputation (also known as chained, or iterative imputation). The concept of sequential imputation assumes a set of conditional regression models appropriately specified for the variables to be imputed. Continuous variables are typically modeled using linear regression models, binary variables are modeled using logit/probit regression models, and so on. The specific steps of sequential imputation are given in section 2. Sequential imputation techniques have been implemented in many software packages. For example, Raghunathan, Lepkowski, VanHoewyk, and Solenberger (2001) developed a SAS macro, “IveWare,” which performs numerous computations; one of the main functionalities is variable-by-variable imputation using a sequence of regression models suitable for continuous, categorical, and semicontinuous variables. An R package, “MICE,” by van Buuren and Groothis-Oudshorn (2011) creates multiple imputations by fully conditional specification with options for drawing from the underlying conditional predictive distributions, including predictive-mean and propensity-score matching. A similar methodology has also been implemented in STATA in the procedure “ICE” by Royston and White (2011). Sequential imputation methods can handle high dimensional data with diverse measurement scales more easily than methods drawing imputations from joint models. An added advantage pertains to skip patterns and bounds, as are common in many surveys. As variables are modeled in a sequential fashion, including such features in each model with just one outcome can more easily be pursued. These practical aspects have proven to be the main factors contributing to the popularity of sequential approaches. It has been noted (Little and Rubin 2002) that these approaches do not necessarily imply a coherent joint imputation model, and hence might not produce proper multiple imputations (Rubin 1987). However, in many complex situations, they have been shown to work well. Their theoretical and empirical aspects have been investigated by several authors. Liu, Gelman, Hill, Su, and Kropko (2014) theoretically characterize their operational and statistical properties and provide conditions for imputations to be proper. These conditions are such that if the conditional distributions are compatible, then the multiple imputations are able to capture the variation in the underlying posterior distributions. If not, MI estimators using Rubin’s (1987) prescription are still consistent. Other work has also appeared in the literature. Lee and Carlin (2010) and Kropko, Goodrich, Gelman, and Hill (2014) investigate the performance of the two approaches (joint versus sequential) in imputation of continuous and categorical variables. In both studies, the two approaches lead to similar inferential conclusions. Kropko et al. (2014) report that the sequential approach leads to more accurate results in settings with categorical variables. Most implementations of sequential imputation methods so far have made use of fixed-effects models. When the underlying design (e.g., longitudinal or two-stage sampling) leads to clustered data (figure 1), these methods have often accounted for clustering by including indicators of clusters as auxiliary information in the individual fixed-effects models. However, as discussed by Andridge (2011) in settings underlying cluster randomized trials, this approach leads to overestimation of intraclass correlations, resulting in positively biased MI variance estimators. More recently, van Buuren and Groothis-Oudshorn (2011) modified the R package MICE to include random effects for the continuous variables via linear mixed-effects models; see also van Buuren (2012). More recently, Resche-Rigon and White (2016) studied a similar algorithm in more detail and proposed new imputation methods, based on variable-by-variable algorithms, for multilevel normal data. However, development of similar models for categorical variables has not emerged within the context of variable-by-variable imputation. A joint modeling approach appropriate for clustered data has been developed by multivariate extensions of linear mixed-effects models by Schafer and Yucel (2002); this approach has also been implemented as an R package, “PAN.” Handling categorical variables with this approach is only possible by continuous approximations to the variables, which can fail in certain instances (Yucel, He and Zaslavsky 2008, 2011). In another implementation, “REALCOM” within MLWin, Carpenter, Goldstein, and Kenward (2011) make use of MCMC techniques to draw multiple imputations in multilevel applications for mixtures of variables, based on a joint model with normal latent variables underlying the categorical variables. REALCOM can be prone to computational problems as the number of categorical variables increases. In its implementation as an R package, “jomo” (Quartagno and Carpenter 2016), this computational aspect has been significantly improved. As mentioned previously, however, such a joint modeling approach can be difficult or infeasible when the data have skip patterns and/or bounds. Figure 1. View largeDownload slide Clustered Incomplete Data with Skip Patterns. Figure 1. View largeDownload slide Clustered Incomplete Data with Skip Patterns. In this paper, we introduce a different approach to incorporating random effects for discrete outcomes in imputation for missing data of mixed types. Our method uses sequential imputation with a normal linear mixed model for each continuous outcome and a generalized linear mixed model for each discrete outcome. We refer to the approach as sequential hierarchical regression imputation, or “SHRIMP.” We investigate, via simulation, the potential gains from (i) fully incorporating random effects to represent clustering and (ii) modeling discrete outcomes using models that are more appropriate for them. Although variable-by-variable methods are most useful with high-dimensional data, several different measurement scales, skip patterns, and bounds, we consider a simple case involving a combination of continuous and binary variables, both for computational simplicity and to facilitate comparisons with methods that are not designed to handle such complexities. Section 2 of the paper outlines our variable-by-variable imputation approach. Models used to impute missing values in the individual variables are stated, and computational methods approximating the posterior predictive distribution under each of these models are also discussed. Section 3 discusses the simulation study, which assesses the performance of SHRIMP and contrasts SHRIMP with the PAN, MICE, and (briefly) jomo approaches. Finally, section 4 discusses the strengths and limitations of our approach and areas for further research. 2. SEQUENTIAL IMPUTATION Let Z denote a data matrix containing completely observed auxiliary variables and Yk (k = 1,2,…, r) denote a set of incompletely observed continuous or categorical variables. Without loss of generality, we assume that these variables are ordered with respect to amount of missing values, from least to most. We model each variable Yk sequentially using an appropriate regression model. The resulting set of regression models are the basis for approximating the posterior predictive distribution. Consider, for example, a survey where income, education, and race are incompletely observed. The sequential imputation process models each of these variables with an appropriate regression model using all other variables, subject to missingness or not. The process samples the missing values for each variable at a time, using the other, completed or fully observed, variables as covariates. Sections 2.1 and 2.2 below explain how this proceeds computationally. SHRIMP enhances sequential imputation approaches by applying mixed-effects models to handle clustering either due to design (multistage sampling or longitudinal) or a naturally occurring hierarchy (e.g., patients within clinics or hospitals). Thus, distinct sources of variation are incorporated at the imputation stage. The mixed-effects models are designed to be congenial with survey inferences, which are often based only on cluster-level (e.g., PSU-level) variability (Meng 1994; Reiter, Raghunathan, and Kinney 2006). As in previous variable-by-variable implementations, SHRIMP proceeds in a cyclical manner, where in each cycle the conditional posterior predictive distribution derived under the corresponding mixed-effects model is sampled. We denote these distributions by fYkYkZ,Y-k,θYk), where Y−k = {Yj:j≠k} denotes all the variables other than Yk, Z denotes fully observed variables, and θYk denotes the unknown parameters of the conditional distribution. Each of these conditionals is modeled by an appropriate regression model (see section 2.1). Missing values are drawn from their conditional posterior predictive distributions given the observed and imputed values (see section 2.2). For additional computational details, see Appendix A. One set of imputations for the missing values is drawn after a predetermined number c of cycles, which are assumed to yield an adequate approximation to the marginal posterior predictive distribution of the missing data. In the first cycle, starting from the variable with the least missing values, we regress each variable Yk on all other completely observed variables and impute the missing values. After this initial cycle imputes all missing values, the second cycle draws the imputed values for each variable with missingness based on the regression model specified for the variable, with the other Z and Y variables as covariates. Missing values are sampled using approximate conditional posterior predictive distributions. As each variable is imputed, the previous imputed values are overwritten with the new ones. After c cycles are completed, we take the final imputed values as the set of imputations. Multiple imputations are created by repeating the c-cycle imputation process multiple times. Inferences from the multiply imputed data are conducted following the MI combining rules (Rubin 1987). Assuming a good approximation to the posterior predictive distribution of the missing data, those combining rules will lead to inferences that have the desired properties (e.g., unbiasedness, nominal coverage, etc.) for a sufficiently large sample size and number of imputations (Rubin 1987; Rubin and Schenker 1986; Reiter and Raghunathan 2007). Postulating a model for imputing from the implied posterior predictive distribution of the missing data is the most crucial stage of the imputation process. This model should be capable of preserving design features and relationships among incomplete and complete variables that may be subject to current/future analyses. Further, to the extent possible, the model should be rich enough to account for causes of missingness to increase the plausibility of an underlying missing at random (MAR) assumption. The naming convention for missingness mechanisms is due to Rubin (1976), and MAR refers to the missingness probabilities depending on the observed data (covariate and/or response) but not on the missing data. See Schafer (1997) and Little and Rubin (2002) for more discussion of imputation models as well as missingness mechanisms. 2.1 Conditional Regression Models Below we describe the models used to impute individual variables conditional on covariates. Here we only consider models appropriate for single-level clustering. Extensions to handle more nested and/or non-nested levels of clustering are straightforward. However, care should be taken to avoid computational complexities due to intractability and lack of convergence (see Appendix A for the computational algorithm for simulating the posterior predictive distribution under a logistic mixed-effects regression model). Let yij denote a value of a random variable Y (continuous, categorical or count) for subject j = 1,2, …, ni in cluster i = 1,2, …, m. We assume the link function:   g(Eyijxij,wi,θ))=ηij, (1) where g(•) denotes a function linking the expected value of response yij to the linear predictor   ηij=xijβ+wibi, (2) and xij and wi are vectors of covariates corresponding to individual- and cluster-specific characteristics, β and bi are the corresponding regression coefficients (population-average fixed effects and cluster-specific random effects, respectively), and θ denotes the unknown parameters such as regression coefficients and variance components governing the underlying model. In the remainder of this paper, we assume a random-intercept-only model, so that wi ≡ 1 and bi are scalars, with bi∼N(0,σb2) independently for i = 1, 2,… , m. In applications with clustered survey data, it is often sufficient to assume a random-intercept model to account for cluster-specific variation. In other applications such as longitudinal studies, it may be necessary to include more structures such as a random slope to reflect variation in the slopes across subjects. The computational algorithms presented here pertain to only binary and continuous variables. Other models are straightforward extensions that follow the same computational logic and are discussed in section 4. For binary yij, we use   gPyij=1xij,wi,θ)=log⁡Pyij=1xij,wi,θ)1-Pyij=1xij,wi,θ), (3) and for continuous yij, we use   g(Eyijxij,wi,θ))=E(yijxij,wi,θ. (4) 2.2 Computational Algorithms SHRIMP samples from the conditional posterior predictive distributions of the variables sequentially. These distributions are derived under the imposed regression models. For a binary variable, for example, a logistic regression mixed-effects model is used. As the implied posterior predictive distribution is not tractable, we employ approximations that make use of weak prior distributions and maximum likelihood estimates and information around them. The use of priors helps to incorporate uncertainty about the parameters. Below we describe the algorithms for linear and logistic mixed-effects regression models. 2.2.1 Linear Mixed-Effects Regression Models Corresponding to (1), (2), and (4), our first imputation model, for a continuous variable yij, is a linear, mixed-effects, random-intercept-only (wi ≡1) model:   yij=xijβ+bi+ɛij, where xij, the individual-level covariates, are either completely observed or imputed in the previous cycles of SHRIMP; and β and bi are the fixed and random effects, respectively. Following standard practice, we assume normal distributions for the error terms and random effects: ɛij∼N(0,σe2) and bi∼N(0,σb2) independently for i = 1, 2,… , m and j = 1, 2,… , ni. A Gibbs sampling algorithm is employed to simulate the joint posterior of β, b1,… , bm, σb, σe and the missing data (ymis). To proceed with the Gibbs sampler, conditional posterior distributions for θ=(β,σb,σe)and b1,… , bm need to be derived. We assume inverse chi-square priors χνb-2 and χνe-2 for the variances σb2 and σe2, respectively, and improper uniform priors on the fixed effects. An algorithm similar to that given by Schafer and Yucel (2002) is adopted in our setting and proceeds sampling iteratively using   β|x,w, b1,…,bm,σe2,σb2∼Nβ^, Vβ^, (5)  σe2|x,w,b1,…,bm,β,σb2∼1+∑i,j^ɛij2χνe+N-1-2,  (6)  bi|x,w,β,σe2,σb2∼N(b^i,Vb^i), (7)  σb2|x,w,b1,…,bm,β,σe2∼νb+∑ibi2νb+mχνb+m-2, (8) where x and w denote the full sets of covariates (with wi ≡ 1),   Vβ^= σe2∑i,jxijTxij-1, β^=Vβ^σe2∑i,jxijT(yij-bi), ɛij^=yij-xijβ^-wibi,  N=∑i=1mni,Vb^i=niσe2+1σb2-1,b^i=Vb^iσe2∑j=1ni(yij-xijβ). For a given set of values β, σb, σe, b1, …, bm, each yij ∈ ymis can be drawn from   yij|x,w,β,σe2,σb2,b1,…,bm∼Nxijβ+bi,σb2+σe2. (9) The steps given by (5) to (9) define one cycle of the Gibbs sampler, and executing such cycles repeatedly creates sequences of the unknown parameters {θ(1), θ(2), …} and {ymis(1), ymis(2), …}, whose limiting distributions are P(θ | yobs, x, w) and P(ymis |yobs, x, w), respectively, where yobs denotes the observed Y-values. We use a draw of ymis as a single random sample from the posterior predictive distribution of missing data upon convergence. We assess convergence by autocorrelation function plots and Gelman-Rubin statistics. 2.2.2 Logistic Mixed-Effects Regression Models For a binary variable yij, based on the observed values, we first compute the maximum likelihood estimates of the parameters under the logistic regression model corresponding to (1) to (3). Then we use these estimates along with the information matrix to develop an approximation to the underlying posterior distribution of the unknown parameters and the posterior predictive distribution of the missing data. Specifically, we work with the following likelihood function   L(θ, b1,…,bm|y, x, w) = ∏i=1m∏j=1niFηijyij(1-Fηij)1-yij, (10) where θ=(β,σb), y is the set of Y-values, and Fη=(1+e-η)-1. Our algorithm requires maximizing the marginal likelihood function   L(θ|y,x,w)=∫bL(θ|y,x,w,b)fbdb, (11) where f(b) represents the probability distribution for the vector b of random effects, the components of which we assume to be independently distributed as N (0, σb2 ). Because there is no closed-form solution to this maximization problem, we employ Gaussian-Hermite quadrature (Abramowitz and Stegun 1972), which is fast and accurate for a random-intercept-only model (Raudenbush, Yang, and Yosef 2000; Hedeker and Gibbons 1994). (See Appendix A for computational details.) A Laplace approximation might be preferable for a richer model such as a random-slope model. 3. SIMULATION ASSESSMENT By its setup, sequential or conditional sampling does not necessarily imply a random sample from a joint posterior predictive distribution of the missing data (Raghunathan, Lepkowski, VanHoewyk, and Solenberger 2001; Little and Rubin 2002; van Buuren, Brand, Groothuis-Oudshoorn, and Rubin 2006; van Buuren 2012). Our simulation study assesses the performance of SHRIMP in terms of frequentist properties. Further, the study compares SHRIMP with PAN, which indeed samples from a joint posterior predictive distribution but uses a continuous (normal) approximation for discrete variables to do so. The study also compares SHRIMP with MICE, which uses sequential regression imputation but only incorporates random effects for the continuous variables and not the discrete ones. The framework of the simulation study consists of complete-data generation from a joint mixed-effects model, imposing missing data under missing completely at random (MCAR) and missing at random (MAR) mechanisms, conducting MI inferences for model parameters and population quantities, and evaluation of the MI inferences with respect to a set of criteria stated below. We focus on MCAR and MAR mechanisms because those mechanisms underlie the models we use. We also briefly mention results from a simulation involving incomplete data under a missing not at random (MNAR) mechanism, without presenting numerical results. 3.1 Study Design 3.1.1 Data Generation Let Y1 and Y2 denote continuous and binary variables, respectively. The joint distribution [Y1, Y2] underlying the data generation mechanism was modeled via the marginal distribution of the continuous component and the conditional distribution of the binary component given the continuous component that is, [Y1, Y2] = [Y1] [Y2|Y1]. Specifically, Y1 was first simulated under a linear model with varying intercepts:   y1ij=1+biY1+ɛij, (12) where i = 1,2,…,50 and j = 1,2,…,10, indexing the clusters and units within a cluster, respectively, and where biY1∼N0,σb12 and ɛij∼N0,σe2. To investigate behavior under varying intracluster correlations (ICC), we allowed σb1 to vary from 0.5 to 2 and σe to vary from 2 to 4. Next, we simulated Y2 conditional on Y1:   logit(Py2ij=1y1ij,biY2))=-1+0.5y1ij+biY2, where i and j are the same indices for clusters and subjects. Similar to (12), this model allowed variation of the intercepts across the clusters, and biY2∼N0,σb22. Similar values for the ICC to those in the continuous model were studied by varying σb2 between 0.4 and 0.9 (for an ICC range between 0.05 and 0.2. We only report results for ICC = 0.05 and ICC = 0.2). Next, we simulated variables that were used in imposing missing values for the MAR mechanism. These variables were also used as auxiliary variables in the imputation models. We simulated the variables Z1|biZ1∼NbiZ1,2 and Z2|biZ2∼NbiZ2,2, where i indexes the clusters, and biZ1and biZ2were distributed independently as N(0, 1). The next step of data generation pertains to imposing missing values under MCAR and MAR mechanisms (Little and Rubin 2002). Let r1 and r2 denote missingness indicators for y1 and y2. Under MCAR, r1 and r2 were simulated from a Bernoulli distribution. We set the success probability to 20 percent, 30 percent, and 40 percent to study the behavior under these rates of missingness. Under MAR, the probability of missingness of y1 and y2 depended on the fully observed auxiliary variables Z1 and Z2:   logit(P(r1=1))=γ0Y1+γ1Y1Z1,  logit(P(r2=1))=γ0Y2+γ1Y2Z2, where the γ values were chosen to lead to approximately 20 percent, 30 percent, and 40 percent rates of missingness. 3.1.2 Estimation and performance We gauge the performance of MI inferences under imputation via SHRIMP, PAN, and MICE with simple hypothetical analyst’s models:   y1ij=α0+α1y2ij+biY1+ɛij, (13)  logit(Py2ij=1y1ij,biY2))=β0+β1y1ij+biY2, (14) where biY1∼N0,τb12, biY2∼N0,τb22, and ɛij∼N0,τe2 are the conventional cluster-specific and residual error terms. (Note: To avoid extraneous notation in the data simulation, analysis, and imputation models, we use identical notation for the random effects and error terms. However, the values of the random effects and error terms can differ across the models.) The parameters in these models are α=α0,α1,β=β0,β1,τb12,τb22and τe2. In addition to the parameters of (13) and (14), we examine the performance of methods in estimating a bivariate summary measure of the data distribution: the probability of observing Y1 > 0 and Y2 = 1. Each of the estimands was also estimated for each simulated data set before deletion of the missing values (BD), and the averages of the BD estimates across the simulations are used as the “true values.” For each simulation scenario, a total of 10 imputations were created under each MI method, and estimates and their standard errors were combined using Rubin’s rules (Rubin 1987). Our choice of 10 imputations was based on considerations of precision, not on the overall computing time. This and other considerations for choosing the number of imputations in general are discussed in Graham, Allison, and Tamika (2007). The specific imputation models for each simulation scenario are given at the bottom of each table of results in the next section and Appendix B. Simulation of data sets, imposing missing values, creating MIs, and estimation based on the multiply imputed data were repeated 1,000 times to create a repeated sampling environment. We gauged the performance of each method using the following criteria, which were computed by averaging across the 1,000 replications: Percent bias (PB): For an estimand T, PB is the magnitude of the raw bias relative to the true value of the estimand. It is calculated by 100×|(E(T^)-T)/T|. A reasonable upper limit for the PB for acceptable performance is 5 percent (Demirtas, Freels, and Yucel 2008). Confidence-interval–based criteria: The coverage rate (CR) is the proportion of confidence intervals that contain the true estimand value. The actual rate should be close to the nominal rate if a procedure is working well. We consider performance to be poor if the actual CR drops below 90 percent for a 95 percent nominal rate (Collins, Schafer, and Kam 2001). We also report average width of confidence intervals (AW). AW is defined as the distance between average lower and upper confidence limits. A high CR along with narrow confidence intervals translates into greater accuracy and higher power. Root mean square error (RMSE): an integrated measure of bias and variance defined by E(T^-T)2. It evaluates an estimator T^ in terms of combined accuracy and precision. 3.2 Summary of the Results Tables 1–4 display true value (TV), average estimate across the simulations (AE), PB, CR, RMSE, and AW for the regression parameters of analyst’s models (13) and (14) and for the bivariate summary measure, the probability of observing Y1 > 0 and Y2 = 1, which is denoted by Q in tables 1–4. We report results for MCAR and MAR mechanisms, sample size of 500 with 50 clusters, and variations in the ICC and missingness rates. Table 1. Summary of the Simulation Results when the Missingness Mechanism Is MCAR, with Missingness Rates of 20% for Each Variable and 36% Jointly ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3300  2.48  93.6  0.1536  0.5247  PAN  0.3384  0.3633  7.38  93.7  0.1541  0.5465  MICE  0.3384  0.3283  2.98  91.1  0.1595  0.5137  α1  SHRIMP  1.6920  1.7090  1.01  94.7  0.2221  0.8288  PAN  1.6920  1.6170  4.43  93.9  0.2242  0.8250  MICE  1.6920  1.7101  1.07  90.5  0.2273  0.8081  β0  SHRIMP  −0.9958  −0.9967  0.08  94.9  0.1648  0.6255  PAN  −0.9958  −0.9523  4.37  94.7  0.1611  0.6377  MICE  −0.9958  −0.9950  0.09  90.1  0.1582  0.6206  β1  SHRIMP  0.4987  0.5031  0.89  94.9  0.0777  0.3012  PAN  0.4987  0.4762  4.50  95.0  0.0744  0.2945  MICE  0.4987  0.5053  1.32  89.4  0.0784  0.2952  Q  SHRIMP  0.3472  0.3480  0.23  94.5  0.0237  0.1001  PAN  0.3472  0.3471  0.04  96.1  0.0226  0.1026  MICE  0.3472  0.3493  0.59  90.4  0.0241  0.0995    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.3069  4.25  92.5  0.3006  1.1420  PAN  −1.2536  −1.1766  6.14  94.0  0.3027  1.2430  MICE  −1.2536  −1.3449  7.28  90.4  0.3141  1.0969  α1  SHRIMP  5.2489  5.3338  1.62  95.0  0.3520  1.4955  PAN  5.2489  5.0159  4.44  95.6  0.3778  1.5390  MICE  5.2489  5.4205  3.27  90.2  0.3812  1.4597  β0  SHRIMP  −1.0091  −0.9715  3.73  94.8  0.1858  0.7352  PAN  −1.0091  −0.9044  10.38  93.7  0.1956  0.7593  MICE  −1.0091  −0.9965  1.25  89.2  0.1871  0.7153  β1  SHRIMP  0.5121  0.4984  2.67  91.8  0.0596  0.2375  PAN  0.5121  0.4629  9.60  87.9  0.0689  0.2324  MICE  0.5121  0.5140  0.37  86.3  0.0627  0.2331  Q  SHRIMP  0.3877  0.3868  0.22  94.7  0.0322  0.1231  PAN  0.3877  0.3820  1.45  94.2  0.0312  0.1249  MICE  0.3877  0.3888  0.29  91.1  0.0321  0.1218  Imputation model for Y1 is Y1 = λ0+λ1*Y2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+bY2.  ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3300  2.48  93.6  0.1536  0.5247  PAN  0.3384  0.3633  7.38  93.7  0.1541  0.5465  MICE  0.3384  0.3283  2.98  91.1  0.1595  0.5137  α1  SHRIMP  1.6920  1.7090  1.01  94.7  0.2221  0.8288  PAN  1.6920  1.6170  4.43  93.9  0.2242  0.8250  MICE  1.6920  1.7101  1.07  90.5  0.2273  0.8081  β0  SHRIMP  −0.9958  −0.9967  0.08  94.9  0.1648  0.6255  PAN  −0.9958  −0.9523  4.37  94.7  0.1611  0.6377  MICE  −0.9958  −0.9950  0.09  90.1  0.1582  0.6206  β1  SHRIMP  0.4987  0.5031  0.89  94.9  0.0777  0.3012  PAN  0.4987  0.4762  4.50  95.0  0.0744  0.2945  MICE  0.4987  0.5053  1.32  89.4  0.0784  0.2952  Q  SHRIMP  0.3472  0.3480  0.23  94.5  0.0237  0.1001  PAN  0.3472  0.3471  0.04  96.1  0.0226  0.1026  MICE  0.3472  0.3493  0.59  90.4  0.0241  0.0995    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.3069  4.25  92.5  0.3006  1.1420  PAN  −1.2536  −1.1766  6.14  94.0  0.3027  1.2430  MICE  −1.2536  −1.3449  7.28  90.4  0.3141  1.0969  α1  SHRIMP  5.2489  5.3338  1.62  95.0  0.3520  1.4955  PAN  5.2489  5.0159  4.44  95.6  0.3778  1.5390  MICE  5.2489  5.4205  3.27  90.2  0.3812  1.4597  β0  SHRIMP  −1.0091  −0.9715  3.73  94.8  0.1858  0.7352  PAN  −1.0091  −0.9044  10.38  93.7  0.1956  0.7593  MICE  −1.0091  −0.9965  1.25  89.2  0.1871  0.7153  β1  SHRIMP  0.5121  0.4984  2.67  91.8  0.0596  0.2375  PAN  0.5121  0.4629  9.60  87.9  0.0689  0.2324  MICE  0.5121  0.5140  0.37  86.3  0.0627  0.2331  Q  SHRIMP  0.3877  0.3868  0.22  94.7  0.0322  0.1231  PAN  0.3877  0.3820  1.45  94.2  0.0312  0.1249  MICE  0.3877  0.3888  0.29  91.1  0.0321  0.1218  Imputation model for Y1 is Y1 = λ0+λ1*Y2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+bY2.  Note.— Total sample size of 500 from 50 clusters. Q = P(Y1 > 0, Y2 = 1). PI, parameter of interest; TV, true value; AE, average estimate; PB, percent bias; CR, coverage rate; RMSE, root mean square error; AW, average width. Table 2. Summary of the Simulation Results when the Missingness Mechanism Is MAR ICC = 0.05, with missingness rate of 20% for each variable and 35% jointly   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3328  1.64  94.1  0.1527  0.5287  PAN  0.3384  0.3568  5.43  93.9  0.1479  0.5524  MICE  0.3384  0.3275  3.21  89.1  0.1505  0.4856  α1  SHRIMP  1.6920  1.6843  0.45  94.5  0.2027  0.8350  PAN  1.6920  1.6019  5.33  91.9  0.2178  0.8396  MICE  1.6920  1.6952  0.19  88.5  0.2320  0.7368  β0  SHRIMP  −0.9958  −0.9851  1.08  95.1  0.1492  0.6254  PAN  −0.9958  −0.9375  5.86  95.0  0.1551  0.6393  MICE  −0.9958  −0.9912  0.46  86.7  0.1643  0.5448  β1  SHRIMP  0.4987  0.4906  1.62  94.6  0.0726  0.3002  PAN  0.4987  0.4672  6.30  92.5  0.0745  0.2954  MICE  0.4987  0.4975  0.23  89.3  0.0817  0.2583  Q  SHRIMP  0.3472  0.3456  0.47  94.8  0.0226  0.1004  PAN  0.3472  0.3454  0.52  94.9  0.0227  0.1021  MICE  0.3472  0.3474  0.05  93.0  0.0252  0.0923    ICC = 0.2, with missingness rates of 22% for Y1, 23% for Y2, and 40% jointly  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2331  1.63  93.9  0.3137  1.1887  PAN  −1.2536  −1.1414  8.95  94.2  0.3374  1.2840  MICE  −1.2536  −1.3146  4.86  88.9  0.2998  1.0733  α1  SHRIMP  5.2489  5.2573  0.16  95.1  0.3254  1.5066  PAN  5.2489  4.9972  4.80  94.4  0.4068  1.5670  MICE  5.2489  5.3753  2.41  90.1  0.3655  1.4154  β0  SHRIMP  −1.0091  −0.9801  2.87  92.9  0.1970  0.7491  PAN  −1.0091  −0.9107  9.75  93.1  0.2074  0.7734  MICE  −1.0091  −0.9966  1.24  85.9  0.2031  0.6545  β1  SHRIMP  0.5121  0.4881  4.69  93.5  0.0587  0.2305  PAN  0.5121  0.4560  10.95  88.7  0.0735  0.2278  MICE  0.5121  0.5059  1.20  86.7  0.0631  0.2066  Q  SHRIMP  0.3877  0.3852  0.64  94.3  0.0349  0.1245  PAN  0.3877  0.3813  1.64  93.4  0.0343  0.1257  MICE  0.3877  0.3866  0.26  87.1  0.0367  0.1185  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  ICC = 0.05, with missingness rate of 20% for each variable and 35% jointly   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3328  1.64  94.1  0.1527  0.5287  PAN  0.3384  0.3568  5.43  93.9  0.1479  0.5524  MICE  0.3384  0.3275  3.21  89.1  0.1505  0.4856  α1  SHRIMP  1.6920  1.6843  0.45  94.5  0.2027  0.8350  PAN  1.6920  1.6019  5.33  91.9  0.2178  0.8396  MICE  1.6920  1.6952  0.19  88.5  0.2320  0.7368  β0  SHRIMP  −0.9958  −0.9851  1.08  95.1  0.1492  0.6254  PAN  −0.9958  −0.9375  5.86  95.0  0.1551  0.6393  MICE  −0.9958  −0.9912  0.46  86.7  0.1643  0.5448  β1  SHRIMP  0.4987  0.4906  1.62  94.6  0.0726  0.3002  PAN  0.4987  0.4672  6.30  92.5  0.0745  0.2954  MICE  0.4987  0.4975  0.23  89.3  0.0817  0.2583  Q  SHRIMP  0.3472  0.3456  0.47  94.8  0.0226  0.1004  PAN  0.3472  0.3454  0.52  94.9  0.0227  0.1021  MICE  0.3472  0.3474  0.05  93.0  0.0252  0.0923    ICC = 0.2, with missingness rates of 22% for Y1, 23% for Y2, and 40% jointly  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2331  1.63  93.9  0.3137  1.1887  PAN  −1.2536  −1.1414  8.95  94.2  0.3374  1.2840  MICE  −1.2536  −1.3146  4.86  88.9  0.2998  1.0733  α1  SHRIMP  5.2489  5.2573  0.16  95.1  0.3254  1.5066  PAN  5.2489  4.9972  4.80  94.4  0.4068  1.5670  MICE  5.2489  5.3753  2.41  90.1  0.3655  1.4154  β0  SHRIMP  −1.0091  −0.9801  2.87  92.9  0.1970  0.7491  PAN  −1.0091  −0.9107  9.75  93.1  0.2074  0.7734  MICE  −1.0091  −0.9966  1.24  85.9  0.2031  0.6545  β1  SHRIMP  0.5121  0.4881  4.69  93.5  0.0587  0.2305  PAN  0.5121  0.4560  10.95  88.7  0.0735  0.2278  MICE  0.5121  0.5059  1.20  86.7  0.0631  0.2066  Q  SHRIMP  0.3877  0.3852  0.64  94.3  0.0349  0.1245  PAN  0.3877  0.3813  1.64  93.4  0.0343  0.1257  MICE  0.3877  0.3866  0.26  87.1  0.0367  0.1185  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  Note.— Total sample size of 500 from 50 clusters. Q = P(Y1 > 0, Y2 = 1). PI, parameter of interest; TV, true value; AE, average estimate; PB, percent bias; CR, coverage rate; RMSE, root mean square error; AE, average width. Table 3. Summary of the Simulation Results when the Missingness Mechanism Is MAR, with Missingness Rates of Approximately 30% for Each Variable and 50% Jointly ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3476  2.72  93.7  0.1609  0.5901  PAN  0.3384  0.3859  14.03  93.4  0.1571  0.6468  MICE  0.3384  0.3485  3.00  89.3  0.1639  0.5118  α1  SHRIMP  1.6920  1.6646  1.62  94.4  0.2526  0.9161  PAN  1.6920  1.5582  7.91  94.6  0.2618  0.9486  MICE  1.6920  1.6698  1.31  87.6  0.2618  0.7901  β0  SHRIMP  −0.9958  −0.9898  0.60  94.7  0.1809  0.6991  PAN  −0.9958  −0.9288  6.74  97.1  0.1673  0.7451  MICE  −0.9958  −0.9941  0.18  87.1  0.1798  0.5731  β1  SHRIMP  0.4987  0.4859  2.55  94.6  0.0840  0.3316  PAN  0.4987  0.4570  8.36  94.1  0.0842  0.3380  MICE  0.4987  0.4924  1.26  84.4  0.0903  0.2756  Q  SHRIMP  0.3472  0.3436  1.03  94.1  0.0297  0.1086  PAN  0.3472  0.3448  0.70  94.0  0.0279  0.1128  MICE  0.3472  0.3451  0.61  90.1  0.0290  0.0954    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2374  1.29  93.9  0.4122  1.3090  PAN  −1.2536  −1.1097  11.48  91.0  0.4267  1.4705  MICE  −1.2536  −1.3114  4.61  82.5  0.3588  1.0912  α1  SHRIMP  5.2489  5.2743  0.48  94.9  0.3936  1.6645  PAN  5.2489  4.8872  6.89  90.1  0.5117  1.7465  MICE  5.2489  5.4462  3.76  88.3  0.4693  1.5162  β0  SHRIMP  −1.0091  −0.9625  4.62  94.3  0.2295  0.8366  PAN  −1.0091  −0.8504  15.72  91.1  0.2607  0.8872  MICE  −1.0091  −1.0089  0.02  84.6  0.2536  0.6945  β1  SHRIMP  0.5121  0.4783  6.60  93.4  0.0790  0.2550  PAN  0.5121  0.4289  16.25  87.0  0.1002  0.2458  MICE  0.5121  0.5056  1.26  83.2  0.0865  0.2305  Q  SHRIMP  0.3877  0.3843  0.86  93.7  0.0362  0.1287  PAN  0.3877  0.3779  2.52  90.6  0.0364  0.1317  MICE  0.3877  0.3888  0.30  83.1  0.0371  0.1172  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3476  2.72  93.7  0.1609  0.5901  PAN  0.3384  0.3859  14.03  93.4  0.1571  0.6468  MICE  0.3384  0.3485  3.00  89.3  0.1639  0.5118  α1  SHRIMP  1.6920  1.6646  1.62  94.4  0.2526  0.9161  PAN  1.6920  1.5582  7.91  94.6  0.2618  0.9486  MICE  1.6920  1.6698  1.31  87.6  0.2618  0.7901  β0  SHRIMP  −0.9958  −0.9898  0.60  94.7  0.1809  0.6991  PAN  −0.9958  −0.9288  6.74  97.1  0.1673  0.7451  MICE  −0.9958  −0.9941  0.18  87.1  0.1798  0.5731  β1  SHRIMP  0.4987  0.4859  2.55  94.6  0.0840  0.3316  PAN  0.4987  0.4570  8.36  94.1  0.0842  0.3380  MICE  0.4987  0.4924  1.26  84.4  0.0903  0.2756  Q  SHRIMP  0.3472  0.3436  1.03  94.1  0.0297  0.1086  PAN  0.3472  0.3448  0.70  94.0  0.0279  0.1128  MICE  0.3472  0.3451  0.61  90.1  0.0290  0.0954    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2374  1.29  93.9  0.4122  1.3090  PAN  −1.2536  −1.1097  11.48  91.0  0.4267  1.4705  MICE  −1.2536  −1.3114  4.61  82.5  0.3588  1.0912  α1  SHRIMP  5.2489  5.2743  0.48  94.9  0.3936  1.6645  PAN  5.2489  4.8872  6.89  90.1  0.5117  1.7465  MICE  5.2489  5.4462  3.76  88.3  0.4693  1.5162  β0  SHRIMP  −1.0091  −0.9625  4.62  94.3  0.2295  0.8366  PAN  −1.0091  −0.8504  15.72  91.1  0.2607  0.8872  MICE  −1.0091  −1.0089  0.02  84.6  0.2536  0.6945  β1  SHRIMP  0.5121  0.4783  6.60  93.4  0.0790  0.2550  PAN  0.5121  0.4289  16.25  87.0  0.1002  0.2458  MICE  0.5121  0.5056  1.26  83.2  0.0865  0.2305  Q  SHRIMP  0.3877  0.3843  0.86  93.7  0.0362  0.1287  PAN  0.3877  0.3779  2.52  90.6  0.0364  0.1317  MICE  0.3877  0.3888  0.30  83.1  0.0371  0.1172  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  Note.— Total sample size of 500 from 50 clusters. Q = P(Y1 > 0, Y2 = 1). PI, parameter of interest; TV, true value; AE, average estimate; PB, percent bias; CR, coverage rate; RMSE, root mean square error; AW, average width. Table 4. Summary of the Simulation Results when the Missingness Mechanism Is MAR, with Missingness Rates of Approximately 38% for Each Variable and 60% Jointly ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3612  6.75  93.9  0.2039  0.6883  PAN  0.3384  0.4218  24.66  94.1  0.2043  0.7910  MICE  0.3384  0.3666  8.33  83.2  0.1811  0.5528  α1  SHRIMP  1.6920  1.6474  2.64  94.6  0.2504  1.0763  PAN  1.6920  1.5039  11.11  93.7  0.2925  1.1008  MICE  1.6920  1.6304  3.64  88.7  0.2780  0.9223  β0  SHRIMP  −0.9958  −0.9921  0.37  91.7  0.2354  0.8121  PAN  −0.9958  −0.9240  7.22  91.6  0.2089  0.9089  MICE  −0.9958  −0.9825  1.34  87.8  0.2160  0.6484  β1  SHRIMP  0.4987  0.4825  3.25  95.0  0.1000  0.3882  PAN  0.4987  0.4411  11.55  92.5  0.1005  0.3936  MICE  0.4987  0.4803  3.68  86.3  0.1079  0.3281  Q  SHRIMP  0.3472  0.3424  1.38  92.3  0.0329  0.1174  PAN  0.3472  0.3418  1.57  93.9  0.0287  0.1278  MICE  0.3472  0.3422  1.44  90.2  0.0283  0.1018    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2760  1.79  94.7  0.4013  1.4241  PAN  −1.2536  −1.0847  13.48  93.9  0.4366  1.6360  MICE  −1.2536  −1.3336  6.38  82.1  0.3687  1.1620  α1  SHRIMP  5.2489  5.2610  0.23  95.0  0.4336  1.8882  PAN  5.2489  4.7939  8.67  91.4  0.6115  2.0009  MICE  5.2489  5.3678  2.26  86.5  0.5192  1.7143  β0  SHRIMP  −1.0091  −0.9504  5.81  92.9  0.2838  0.9370  PAN  −1.0091  −0.8349  17.26  91.3  0.2779  1.0414  MICE  −1.0091  −0.9527  5.59  82.6  0.2736  0.7518  β1  SHRIMP  0.5121  0.4701  8.20  91.9  0.0817  0.2882  PAN  0.5121  0.4121  19.51  67.9  0.1117  0.2740  MICE  0.5121  0.4778  6.69  83.1  0.0889  0.2638  Q  SHRIMP  0.3877  0.3800  1.99  91.8  0.0394  0.1357  PAN  0.3877  0.3723  3.96  91.1  0.0382  0.1398  MICE  0.3877  0.3826  1.30  82.3  0.0393  0.1175  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  ICC = 0.05   PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  0.3384  0.3612  6.75  93.9  0.2039  0.6883  PAN  0.3384  0.4218  24.66  94.1  0.2043  0.7910  MICE  0.3384  0.3666  8.33  83.2  0.1811  0.5528  α1  SHRIMP  1.6920  1.6474  2.64  94.6  0.2504  1.0763  PAN  1.6920  1.5039  11.11  93.7  0.2925  1.1008  MICE  1.6920  1.6304  3.64  88.7  0.2780  0.9223  β0  SHRIMP  −0.9958  −0.9921  0.37  91.7  0.2354  0.8121  PAN  −0.9958  −0.9240  7.22  91.6  0.2089  0.9089  MICE  −0.9958  −0.9825  1.34  87.8  0.2160  0.6484  β1  SHRIMP  0.4987  0.4825  3.25  95.0  0.1000  0.3882  PAN  0.4987  0.4411  11.55  92.5  0.1005  0.3936  MICE  0.4987  0.4803  3.68  86.3  0.1079  0.3281  Q  SHRIMP  0.3472  0.3424  1.38  92.3  0.0329  0.1174  PAN  0.3472  0.3418  1.57  93.9  0.0287  0.1278  MICE  0.3472  0.3422  1.44  90.2  0.0283  0.1018    ICC = 0.2  PI  Method  TV  AE  PB  CR  RMSE  AW    α0  SHRIMP  −1.2536  −1.2760  1.79  94.7  0.4013  1.4241  PAN  −1.2536  −1.0847  13.48  93.9  0.4366  1.6360  MICE  −1.2536  −1.3336  6.38  82.1  0.3687  1.1620  α1  SHRIMP  5.2489  5.2610  0.23  95.0  0.4336  1.8882  PAN  5.2489  4.7939  8.67  91.4  0.6115  2.0009  MICE  5.2489  5.3678  2.26  86.5  0.5192  1.7143  β0  SHRIMP  −1.0091  −0.9504  5.81  92.9  0.2838  0.9370  PAN  −1.0091  −0.8349  17.26  91.3  0.2779  1.0414  MICE  −1.0091  −0.9527  5.59  82.6  0.2736  0.7518  β1  SHRIMP  0.5121  0.4701  8.20  91.9  0.0817  0.2882  PAN  0.5121  0.4121  19.51  67.9  0.1117  0.2740  MICE  0.5121  0.4778  6.69  83.1  0.0889  0.2638  Q  SHRIMP  0.3877  0.3800  1.99  91.8  0.0394  0.1357  PAN  0.3877  0.3723  3.96  91.1  0.0382  0.1398  MICE  0.3877  0.3826  1.30  82.3  0.0393  0.1175  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  Note.— Total sample size of 500 from 50 clusters. Q = P(Y1 > 0, Y2 = 1). PI, parameter of interest; TV, true value; AE, average estimate; PB, percent bias; CR, coverage rate; RMSE, root mean square error; AW, average width. Tables 1 and 2 present results for MCAR and MAR, respectively. The rates of missingness for Y1 and Y2 were set to approximately 20 percent marginally each and 36 percent jointly. PBs and CRs that are beyond acceptable limits are shown in bold. Generally, SHRIMP outperforms PAN and MICE with respect to PB, CR, and the other criteria. SHRIMP has no bolded entries for PB and CR, whereas PAN has several bolded entries for PB and one for CR, and MICE has several for CR. Biases such as those associated with PAN are typical for methods with underlying imputation models that assume Gaussian structure for binary variables whose observed distributions have asymmetry (Yucel, He, and Zaslavsky 2008). Although MICE leads to estimates with low bias, its lack of accounting for part of the cluster-level variation leads to inferences that are artificially too precise, in other words, narrower CIs and hence lower-than-nominal CRs. More often than not, SHRIMP has the lowest RMSE of the three methods, particularly when ICC = 0.2. SHRIMP also usually has a lower AW than does PAN. (We do not compare SHRIMP and MICE with respect to AW because of MICE’s lower than nominal CRs.) The three methods have the most similar performance under the MCAR mechanism with ICC = 0.05. The better performance of SHRIMP becomes more apparent when the missing data are MAR (which is more realistic than MCAR in most multivariate settings), as well as when the ICC increases. Tables 3 and 4 display results under the MAR mechanism with higher rates of missingness (up to 60 percent). As might be expected, the performances of all of the methods degrade to some extent. For example, SHRIMP and MICE now have a few bolded entries for PB, PAN’s PBs tend to increase, and MICE’s CRs tend to decrease. Nevertheless, the general trends described above for Tables 1 and 2 persist. Moreover, the better performance of SHRIMP, particularly relative to PAN regarding PB and MICE regarding CR, become more pronounced. Although not shown here, we observed quite dramatic changes in performance of the methods when the missingness mechanism was MNAR. SHRIMP performed the best for most of the estimands and was only outperformed by PAN with respect to inference for the intercept parameters. However, none of the methods performed particularly well under MNAR, as would be expected given that the methods assumed MAR. In a limited simulation study, we also compared the performance of our methods with that of the recently-implemented R package jomo (Quartagno and Carpenter 2016), which was described in Section 1. For settings with ICC = 0.2, MAR, and around 50 percent joint missingness, a summary of the results is provided in Appendix B. Overall, jomo and SHRIMP performed similarly with respect to the criteria given in section 3.1.2, although jomo had a slightly lower coverage rate, along with higher bias and RMSE, for β0. When joint modeling is possible, jomo presents a viable software tool for conducting MI inference. We note several points. The first is the role of the priors, which is negligible in most settings. We compared the inverse chi-square prior and the half-t family prior on the variances of the random-effect terms (Gelman 2006). The results (not shown here) indicate no discernible difference due to selection of priors. The second pertains to the goodness of fit of the imputation models. We assessed the fit of the imputation models against the complete simulated data, and the models fit the data well. The third point relates to improving the coherence of the variable-by-variable procedures with the underlying joint predictive distribution of missing data. To investigate the sensitivity to incoherence with regard to the random effects, we conducted additional computations with procedures that augmented each conditional imputation model with the predicted random effects from the other conditional models in each cycle to improve the overall consistency with the joint modeling approach. The following are the augmented imputation models in imputation cycle c:   y1ij=λ0+λ1y2ij+λ2z1ij+λ3z2ij+λ4biY2c-1+biY1+ɛij,  logit(Py2ij=1y1ij,biY2))=ξ0+ξ1y1ij+ξ2z1ij+ξ3z2ij+ξ3biY1(c)+biY2, where c - 1 and c in parentheses refer to the estimate’s cycle number. Our simulation-based investigation showed that the improvements due to using the augmented imputation model were not substantial under the MCAR and MAR scenarios, although there were more noticeable improvements for most of the estimands under the MNAR scenario, especially for the intercept of the linear model. Nevertheless, the performances of all of the methods under MNAR were subpar. 4. DISCUSSION Our primary goal in the simulation study was to assess the performance of the SHRIMP method in missing-data problems where the data structure involves correlated observational units. Typical application areas would be surveys with diverse sets of questionnaire items with possible skip patterns and where data collection is pursued using a longitudinal design or a multistage sampling scheme. We did not consider the compatibility of our conditional imputation models, but rather conducted a simulation study involving relatively simple scenarios that not only allowed us to evaluate SHRIMP but to compare it to several alternative approaches. In the scenarios considered, with moderate rates of missingness and reasonably rich imputation models consistent with the data structure, SHRIMP tended to outperform the alternatives, with higher accuracy and more efficiency. This finding is significant because in many surveys, practitioners face complexities such as skip patterns/censoring and variables measured on different scales, that are targeted by sequential approaches to imputation. Although our simulation scenarios were limited, we believe that our study provides a reasonable initial reference point for more complex situations. In all applications of imputation, users should be careful and diligent in building their imputation models. A general strategy is to include variables that would be good predictors of the missing values, that will be used in the postimputation phase, and that could explain missingness to reduce potential biases due to nonresponse. Finally, imputation models should reflect design information and variables. In addition to careful consideration of variables, it is also important to carefully build imputation models with respect to their functional forms as well as interaction among the variables and other potential nonlinearities. For more information on building imputation models, see Rubin (1987, 1996), Reiter, Raghunathan, and Kinney (2006), and Bondarenko and Raghunathan (2016). The goal of this paper was not to derive a computationally efficient algorithm, but rather to demonstrate the feasibility of the SHRIMP method. Since SHRIMP is computationally burdensome, one future topic would be to develop computationally efficient algorithms and fast ML estimation under generalized linear mixed-effects models in large data problems and in problems with richer random effects. We focused on random-intercept-only imputation models. This was motivated largely by computational feasibility (approximations quickly become computationally expensive as the number of random effects increases) and by the view that random intercepts could be sufficient to capture cluster-specific variation. This sufficiency might be realized more in settings with rich sets of auxiliary variables available as covariates. However, in certain conditions as described by Lohr (2014), misspecification of the random-effects structure could lead to understated design effects. Our current work focuses on several important extensions of SHRIMP. One extension is to incorporate imputation models for nominal, count, and semicontinuous variables to further handle variables measured on different scales. We also plan to extend our algorithms so that higher levels of nesting are allowed (e.g., longitudinal assessments of students nested within schools). These extensions are important because missing values can take quite arbitrary patterns at potentially any levels of observational units (e.g., school-level characteristics might be incomplete). A final extension is to make the methods flexible enough to handle unique data features that are commonly seen in survey practices, such as bounds and restrictions, in addition to multiple types of missing data, such as when participants may either answer “I don’t know” to a sensitive item or simply refuse to answer. APPENDIX A: DETAILED ALGORITHM FOR LOGISTIC MIXED-EFFECT REGRESSION MODEL From the assumption of independence of the bi, i = 1,…,m, it follows that the marginal likelihood (11) can be expressed as ∏i=1m ∫biLi(θ|yi, xi, wi, bi) fi(bi) dbi, where Li(θ|yi, xi, wi, bi) is the factor in likelihood (10) corresponding to cluster i; yi, xi, and wi denote the data values in cluster i, and fi(bi) is the N(0, σb2) density function. Thus, Gauss-Hermitian quadrature approximates the marginal likelihood by   L(θ|y, x, w)≈∏i=1m∑h = 1HLi(θ|yi, xi, wi, bi=Bh)W(Bh), (A.1) where Bh and W(Bh) are quadrature points and weights, respectively, and H is the number of quadrature points. From (1) to (3), (10), (11), and (A.1), it follows that the marginal log-likelihood is approximated as   lθ|y, x, w≈∑i=1m∑h=1H∑j=1niWBhyijlog⁡F(ηihj)1-Fηihj+log⁡(1-Fηihj), (A.2)  whereηihj= xijβ+Bh. The Fisher scoring method is employed to find the estimates that maximize the approximate marginal log-likelihood (A.2). A normal distribution N(θ,^C-1) is then built, where θ^ is the maximum-likelihood estimate and C is the information matrix at convergence, and a Metropolis-Hastings algorithm is employed to perform the sampling from this proposal distribution. The samples are drawn with the acceptance probability of the minimum value of the posterior ratio and 1. For each imputation of the missing values, a draw of the regression parameters θ=(β,σb) is obtained after the sampler chain reaches its convergence. Then for each cluster i, the random effect bi is drawn from its posterior distribution N(0,σb2). Finally, each yij∈ymis is drawn from the following Bernoulli distribution:   yij|x,w,β,σb,b1,…,bm∼BernoulliGxijβ+bi,  whereGη=logit-1η. APPENDIX B: ADDITIONAL SIMULATION RESULTS INCLUDING COMPARISONS WITH JOMO Table B.1.Summary of the Simulation Results when the Missingness Mechanism Is MAR, with Missingness Rates of Approximately 30% for Each Variable and 50% Jointly, and with ICC = 0.2 PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  −1.2536  −1.2374  1.29  93.9  0.4122  1.3090  PAN  −1.2536  −1.1097  11.48  91.0  0.4267  1.4705  MICE  −1.2536  −1.3114  4.61  82.5  0.3588  1.0912    jomo  −1.2536  −1.2734  1.57  93.4  0.4204  1.3101  α1  SHRIMP  5.2489  5.2743  0.48  94.9  0.3936  1.6645  PAN  5.2489  4.8872  6.89  90.1  0.5117  1.7465  MICE  5.2489  5.4462  3.76  88.3  0.4693  1.5162    jomo  5.2489  5.2898  0.78  95.1  0.3978  1.6534  β0  SHRIMP  −1.0091  −0.9625  4.62  94.3  0.2295  0.8366  PAN  −1.0091  −0.8504  15.72  91.1  0.2607  0.8872  MICE  −1.0091  −1.0089  0.02  84.6  0.2536  0.6945    jomo  -1.0091  0.9301  7.82  86.6  0.2399  0.7245  β1  SHRIMP  0.5121  0.4783  6.60  93.4  0.0790  0.2550  PAN  0.5121  0.4289  16.25  87.0  0.1002  0.2458  MICE  0.5121  0.5056  1.26  83.2  0.0865  0.2305    jomo  0.5121  0.4899  4.33  94.9  0.071  0.2281  Q  SHRIMP  0.3877  0.3843  0.86  93.7  0.0362  0.1287  PAN  0.3877  0.3779  2.52  90.6  0.0364  0.1317  MICE  0.3877  0.3888  0.30  83.1  0.0371  0.1172    jomo  0.3877  0.3835  1.08  92.9  0.0333  0.1221  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  PI  Method  TV  AE  PB  CR  RMSE  AW  α0  SHRIMP  −1.2536  −1.2374  1.29  93.9  0.4122  1.3090  PAN  −1.2536  −1.1097  11.48  91.0  0.4267  1.4705  MICE  −1.2536  −1.3114  4.61  82.5  0.3588  1.0912    jomo  −1.2536  −1.2734  1.57  93.4  0.4204  1.3101  α1  SHRIMP  5.2489  5.2743  0.48  94.9  0.3936  1.6645  PAN  5.2489  4.8872  6.89  90.1  0.5117  1.7465  MICE  5.2489  5.4462  3.76  88.3  0.4693  1.5162    jomo  5.2489  5.2898  0.78  95.1  0.3978  1.6534  β0  SHRIMP  −1.0091  −0.9625  4.62  94.3  0.2295  0.8366  PAN  −1.0091  −0.8504  15.72  91.1  0.2607  0.8872  MICE  −1.0091  −1.0089  0.02  84.6  0.2536  0.6945    jomo  -1.0091  0.9301  7.82  86.6  0.2399  0.7245  β1  SHRIMP  0.5121  0.4783  6.60  93.4  0.0790  0.2550  PAN  0.5121  0.4289  16.25  87.0  0.1002  0.2458  MICE  0.5121  0.5056  1.26  83.2  0.0865  0.2305    jomo  0.5121  0.4899  4.33  94.9  0.071  0.2281  Q  SHRIMP  0.3877  0.3843  0.86  93.7  0.0362  0.1287  PAN  0.3877  0.3779  2.52  90.6  0.0364  0.1317  MICE  0.3877  0.3888  0.30  83.1  0.0371  0.1172    jomo  0.3877  0.3835  1.08  92.9  0.0333  0.1221  Imputation model for Y1 is Y1 = λ0+λ1*Y2+λ2*Z1+λ3*Z2+bY1+ɛ.  Imputation model for Y2 is logit(P(Y2 = 1))=ξ0+ξ1*Y1+ξ2*Z1+ξ3*Z2+bY2.  Note.— Total sample size of 500 from 50 clusters. Q = P(Y1 > 0, Y2 = 1). PI, parameter of interest; TV, true value; AE, average estimate; PB, percent bias; CR, coverage rate; RMSE, root mean square error; AW, average width. References Abramowitz M., Stegun I. A., eds. ( 1972), Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables  ( 9th printing), New York: Dover. Albert J. ( 2007), Bayesian Computation with R , New York: Springer. Google Scholar CrossRef Search ADS   Andridge R. R. ( 2011), “ Quantifying the Impact of Fixed Effects Modeling of Clusters in Multiple Imputation for Cluster Randomized Trials,” Biometrical Journal , 53, 57– 74. Google Scholar CrossRef Search ADS PubMed  Bondarenko I., Raghunathan T. ( 2016), “ Graphical and Numerical Diagnostic Tools to Assess Suitability of Multiple Imputations and Imputation Models,” Statistics in Medicine , 35, 3007– 3020. Google Scholar CrossRef Search ADS PubMed  Carpenter J. R., Goldstein H., Kenward M. G. ( 2011), “ REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types,” Journal of Statistical Software , 45, 1– 14. Google Scholar CrossRef Search ADS   Collins L. M., Schafer L., Kam C. H. ( 2001), “ A Comparison of Inclusive and Restrictive Strategies in Modern Missing Data Procedures,” Psychological Methods , 6, 330– 351. Google Scholar CrossRef Search ADS PubMed  Demirtas H., Freels S. A., Yucel R. M. ( 2008), “ The Plausibility of Multivariate Normality Assumption when Multiply Imputing Non-Gaussian Continuous Outcomes: A Simulation Assessment,” Journal of Statistical Computation and Simulation , 78, 69– 84. Google Scholar CrossRef Search ADS   Gelman A. ( 2006), “ Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper),” Bayesian Analysis , 1, 515– 534. Google Scholar CrossRef Search ADS   Graham J. W., Allison O., Tamika D. G. ( 2007), “ How Many Imputations Are Really Needed? Some Practical Clarifications of Multiple Imputation Theory,” Prevention Science , 8, 206– 213. Google Scholar CrossRef Search ADS PubMed  Hedeker D., Gibbons R. ( 1994), “ A Random-Effects Ordinal Regression Model for Multilevel Analysis,” Biometrics , 50, 933– 944. Google Scholar CrossRef Search ADS PubMed  Kropko J., Goodrich B., Gelman A., Hill J. ( 2014), “ Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal and Conditional Approaches,” Political Analysis , 22, 497– 519. Google Scholar CrossRef Search ADS   Jennrich R. I., Schluchter M. D. ( 1986), “ Unbalanced Repeated-Measures Models with Structured Covariance Matrices,” Biometrics , 42, 805– 820. Google Scholar CrossRef Search ADS PubMed  Laird N. M., Ware J. H. ( 1982), “ Random-Effects Models for Longitudinal Data,” Biometrics , 38, 963– 974. Google Scholar CrossRef Search ADS PubMed  Lee K. J., Carlin J. B. ( 2010), “ Multiple Imputation for Missing Data: Fully Conditional Specification versus Multivariate Normal Imputation,” American Journal of Epidemiology , 171, 624– 632. Google Scholar CrossRef Search ADS PubMed  Little R., Rubin D. B. ( 2002), Statistical Analysis with Missing Data  ( 2nd ed.), New York: John Wiley & Sons. Google Scholar CrossRef Search ADS   Liu J., Gelman A., Hill J., Su Y. S., Kropko J. ( 2014), “ On the Stationary Distribution of Iterative Imputations,” Biometrika , 101 1, 155– 173. Google Scholar CrossRef Search ADS   Lohr S. L. ( 2014), “ Design Effects for a Regression Slope in a Cluster Sample,” Journal of Survey Statistics and Methodology , 2, 97– 125. Google Scholar CrossRef Search ADS   Mccoloch C. E., Searle S. R. ( 2001), Generalized, Linear, and Mixed Models , New York: John Wiley & Sons. Meng X. L. ( 1994), “Multiple-Imputation Inference with Uncongenial Sources of Input” (with discussion), Statistical Science , 9, 538– 573. Google Scholar CrossRef Search ADS   Olsen M., Schafer J. L. ( 2001), “ A Two-Part Random-Effects Model for Semicontinuous Longitudinal Data,” Journal of the American Statistical Association , 96, 730– 745. Google Scholar CrossRef Search ADS   Quartagno M., Carpenter J. ( 2016), jomo: A Package for Multilevel Joint Modelling Multiple Imputation, Available at http://cran.r-project.org/package=jomo. Raghunathan T. E., Lepkowski M., Van Hoewyk J., Solenberger P. ( 2001), “ A Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models,” Survey Methodology , 27, 85– 95. Raudenbush S., Bryke A. ( 2002), Hierarchical Linear Models: Applications and Data Analysis Methods  ( 2nd ed.), Thousand Oaks, CA: Sage. Raudenbush S., Yang M., Yosef M. ( 2000), “ Maximum Likelihood for Generalized Linear Models with Nested Random Effects via High-Order, Multivariate Laplace Approximation,” Journal of Computational and Graphical Statistics , 9, 141– 157. Resche-Rigon M., White I. R. ( 2016), “ Multiple Imputation by Chained Equations for Systematically and Sporadically Missing Multilevel Data,” Statistical Methods in Medical Research . Reiter J. P., Raghunathan T. E. ( 2007), “ The Multiple Adaptations of Multiple Imputation,” Journal of the American Statistical Association , 102, 1462– 1471. Google Scholar CrossRef Search ADS   Reiter J. P., Raghunathan E., Kinney S. K. ( 2006), “ Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data,” Survey Methodology , 32, 143– 150. Royston P., White I. R. ( 2011), “ Multiple Imputation by Chained Equations (MICE): Implementation in Stata,” Journal of Statistical Software , 45, 1– 20. Google Scholar CrossRef Search ADS   Rubin D. B. ( 1976), “ Inference and Missing Data” (with discussion), Biometrika , 63, 581– 592. Google Scholar CrossRef Search ADS   Rubin D. B. ( 1987), Multiple Imputation for Nonresponse in Surveys , New York: John Wiley and Sons. Google Scholar CrossRef Search ADS   Rubin D. B. ( 1996), “ Multiple Imputation After 18+ Years (with Discussion),” Journal of the American Statistical Association , 91, 473– 489. Google Scholar CrossRef Search ADS   Rubin D. B., Schenker N. ( 1986), “ Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse,” Journal of the American Statistical Association , 81, 366– 374. Google Scholar CrossRef Search ADS   Schafer J. L. ( 1997), Analysis of Incomplete Data , London: Chapman & Hall. Schafer J. L. ( 1999), “ Multiple Imputation: A Primer,” Statistical Methods in Medical Research , 8, 3– 15. Google Scholar CrossRef Search ADS PubMed  Schafer J. L., Yucel R. M. ( 2002), “ Computational Strategies for Multivariate Linear Mixed-Effects Models with Missing Values,” Journal of Computational and Graphical Statistics , 11, 421– 442. Google Scholar CrossRef Search ADS   Van Buuren S. ( 2007), “ Multiple Imputation of Discrete and Continuous Data by Fully Conditional Specification,” Statistical Methods in Medical Research , 16, 219– 242. Google Scholar CrossRef Search ADS PubMed  Van Buuren S. ( 2012), Flexible Imputation of Missing Data , London, CRC Press. Google Scholar CrossRef Search ADS   Van Buuren S., Brand J.P., Groothuis-Oudshoorn C G.M., Rubin D.B. ( 2006), Fully conditional specification in multivariate imputation. Journal of statistical computation and simulation , 76 12, 1049– 1064. Google Scholar CrossRef Search ADS   Van Buuren S., Groothuis-Oudshoorn C. ( 2011), “ MICE: Multivariate Imputation by Chained Equations in R,” Journal of Statistical Software , 45 3, 1– 67. Yucel R.M., He Y., Zaslavsky A.M. ( 2008), Using calibration to improve rounding in imputation. The American Statistician , 62 2, pp. 125– 129. Google Scholar CrossRef Search ADS   Yucel R.M., He Y., Zaslavsky A.M. ( 2011), Gaussian–based routines to impute categorical variables in health surveys. Statistics in medicine , 30 29, 3447– 3460. Google Scholar CrossRef Search ADS PubMed  © The Author 2017. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Journal

Journal of Survey Statistics and MethodologyOxford University Press

Published: Mar 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Unlimited reading

Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere.

Stay up to date

Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates.

Organize your research

It’s easy to organize your research with our built-in tools.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve Freelancer

DeepDyve Pro

Price
FREE
$49/month

$360/year
Save searches from
Google Scholar,
PubMed
Create lists to
organize your research
Export lists, citations
Read DeepDyve articles
Abstract access only
Unlimited access to over
18 million full-text articles
Print
20 pages/month
PDF Discount
20% off