Journal of Real Estate Finance and Economics, 29:2, 233±254, 2004
# 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.
Models for Spatially Dependent Missing Data
JAMES P. LESAGE
Department of Economics, University of Toledo, Toledo, OH 43606, U.S.A.
R. KELLEY PACE
LREC Endowed Chair of Real Estate, Department of Finance, E.J. Ourso College of Business Administration,
Louisiana State University, Baton Rouge, LA 70803-6308, U.S.A.
Most hedonic pricing studies using transaction data employ only sold properties. Since the properties sold during
any year or even decade represent only a fraction of all properties, this approach ignores the potentially valuable
information content of unsold properties which have known characteristics. In fact, explanatory variable
information on house characteristics for all properties, sold and unsold, are often available from assessors. We set
forth an estimation approach that predicts missing values of the dependent variable when the sample data exhibit
spatial dependence. Employing information on the housing characteristics of both sold and unsold properties can
improve prediction, increase estimation ef®ciency for the missing-at-random case, and reduce self-selection bias
in the non-missing-at-random case. We demonstrate these advantages with a Monte Carlo experiment as well as
with actual housing data.
Key Words: spatial missing data, EM algorithm, sparse matrices, assessment, spatial sample selectivity, hedonic
Assessors must provide estimated values for residential property. A common approach
involves use of the relation between market values from recently sold properties and
characteristics of these properties to predict market values of unsold properties. Assessors
maintain a database of property characteristics for all houses, sold and unsold, yet this is an
overlooked data resource. In any given year, only a small fraction of all properties sell
(e.g., two to six percent per year for the data used here), so sold property information may
represent a small sub-sample of all property information. Moreover, some databases have
begun relatively recently and thus omit properties sold prior to the startup date. Other
databases change the variables collected or the procedures used to collect the data, and
may also effectively omit property information prior to the change.
We propose treating the unknown transaction prices for unsold properties in these
databases as missing values. Knight et al. (1998) deal with the case of least-squares models
involving statistically independent data and imputing both independent and dependent
variable missing values. In this case, missing value techniques substitute estimated values
for the missing values to produce a repaired data set (Little and Rubin, 2002; Rao and