Quality & Quantity 35: 147–160, 2001.
© 2001 Kluwer Academic Publishers. Printed in the Netherlands.
Reducing Missing Data in Surveys: An Overview of
EDITH D. DE LEEUW
MethodikA Amsterdam, Plantage Doklaan 40, NL-1018 CN Amsterdam, The Netherlands
Abstract. Although item nonresponse can never be totally prevented, it can be considerably reduced,
and thereby provide the researcher with not only more useable data, but also with helpful auxiliary
information for a better imputation and adjustment. To achieve this an optimal data collection design
is necessary. The optimization of the questionnaire and survey design are the main tools a researcher
has to reduce the number of missing data in any such survey. In this contribution a concise typology
of missing data patterns and their sources of origin are presented. Based on this typology, the mech-
anisms responsible for missing data are identiﬁed, followed by a discussion on how item nonresponse
can be prevented.
Key words: item nonresponse, causes of missingness, cognitive pretest, data collection mode,
ignorability, question wording, questionnaire development, sensitive questions, survey
In an ideal survey situation, everyone answers all the questions and there is no
nonresponse. However, life is far from ideal and nonresponse does occur. There
are various forms of nonresponse: unit nonresponse, where a whole unit fails to
provide data and item nonresponse, where data on particular items are missing.
Both unit and item nonresponse can pose serious problems for researchers, or in
the words of Sherlock Holmes: ‘Data! data! data! I can’t make bricks without clay’
(Conan Doyle, 1981). This article is devoted to the prevention of item nonresponse.
For a thorough discussion on the prevention of unit nonresponse in interviews, see
Groves and Couper (1998) and Morton-Williams (1993); for mail surveys see the
classic handbook by Dillman (1978).
When item nonresponse occurs, a unit (e.g., a person) provides data, but for
some reason the data on particular items or questions are unavailable for analysis.
In other words, there are gaps in the data matrix. Not so long ago, researchers
simply ignored the problem and restricted their analysis to observed values or com-
plete cases. However, cases with missing data and the easy solutions of ‘pairwise’
This manuscript is based on an invited lecture for the Social and Economical Sciences Chapters
of the Netherlands Society for Statistics and Operational Research, October 1998. A concise version
was presented at a seminar on item nonresponse of the U.K. Survey Methods Center, London in