Quality & Quantity 37: 363–376, 2003.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
Methods for the Analysis of Explanatory Linear
Regression Models with Missing Data Not at
JOSÉ BLAS NAVARRO PASTOR
Departament de Psicobiologia i Metodologia, Universitat Autònoma de Barcelona, Ediﬁci B, 08193
Abstract. Since the work of Little and Rubin (1987) not substantial advances in the analysis of
explanatory regression models for incomplete data with missing not at random have been achieved,
mainly due to the difﬁculty of verifying the randomness of the unknown data. In practice, the analysis
of nonrandom missing data is done with techniques designed for datasets with random or completely
random missing data, as complete case analysis, mean imputation, regression imputation, maximum
likelihood or multiple imputation. However, the data conditions required to minimize the bias derived
from an incorrect analysis have not been fully determined. In the present work, several Monte Carlo
simulations have been carried out to establish the best strategy of analysis for random missing
data applicable in datasets with nonrandom missing data. The factors involved in simulations are
sample size, percentage of missing data, predictive power of the imputation model and existence of
interaction between predictors. The results show that the smallest bias is obtained with maximum
likelihood and multiple imputation techniques, although with low percentages of missing data, ab-
sence of interaction and high predictive power of the imputation model (frequent data structures in
research on child and adolescent psychopathology) acceptable results are obtained with the simplest
Key words: nonrandom missing data; regression analysis; incomplete maximum likelihood estima-
tion; multiple imputation; Monte Carlo simulation.
Twenty years ago Greenlees et al. (1982) wrote “there is a large literature on the
problem of parameter estimation, but with few exceptions this literature treats the
case in which the missing values are missing at random”. Although substantial
advances have been made, this statement continues to be valid.
In the last twenty years, many researchers have assessed the requirements of dif-
ferent methods for the analysis of incomplete data, showing that single imputation
(unconditional or conditional mean, stochastic regression, hot deck, artiﬁcial neural
networks, etc.), complete-case or listwise analysis, available-case or pairwise ana-
lysis, maximum likelihood (Expectation-Maximisation or EM algorithm, structural