Received: 13 January 2016 Revised: 19 April 2017
Binary response panel data models with sample selection
Jeffrey M. Wooldridge
Department of Economics, Florida State
University, Tallahassee, FL, USA
Department of Economics, Michigan State
University, East Lansing, MI, USA
Anastasia Semykina, Department of
Economics, Florida State University,
Tallahassee, FL 32306-2180, USA.
We consider estimating binary response models on an unbalanced panel, where the
outcome of the dependent variable may be missing due to nonrandom selection, or
there is self-selection into a treatment. In the present paper, we first consider esti-
mation of sample selection models and treatment effects using a fully parametric
approach, where the error distribution is assumed to be normal in both primary and
selection equations. Arbitrary time dependence in errors is permitted. Estimation of
both coefficients and partial effects, as well as tests for selection bias, are discussed.
Furthermore, we consider a semiparametric estimator of binary response panel data
models with sample selection that is robust to a variety of error distributions. The
estimator employs a control function approach to account for endogenous selection
and permits consistent estimation of scaled coefficients and relative effects.
Empirical researchers have shown growing interest in estimating binary response panel data models where sample selection and
self-selection issues arise. A sample selection problem is a possibility whenever a panel dataset is unbalanced. For example,
binary response models with unbalanced panels arise in labor economics when studying the probability of a worker being
employed in a job with benefits with selection occurring due to nonrandom self-selection into the labor force. In studies that
focus on estimating treatment effects, complications arise if self-selection into the treatment is not random. Estimation methods
that address the selection problem can be helpful to empirical researchers who do policy evaluation with binary responses.
The problem of nonrandom selection has received substantial attention in the theoretical econometrics literature. Several new
methods have been proposed for estimating selection models using panel data. However, the focus of that literature has been
on linear or partially linear panel data models. For example, Wooldridge (1995) and Rochina-Barrachina (1999) propose para-
metric estimators of the linear panel data model under sample selection when the explanatory variables are strictly exogenous.
Kyriazidou (1997) derives a semiparametric estimator for such models. Estimation of linear unobserved effects panel data
models with endogenous explanatory variables and nonrandom sample selection was considered, for example, by Charlier,
Melenberg, and van Soest (2001) and Semykina and Wooldridge (2010). In this paper, we discuss estimating binary response
panel data models in the presence of nonrandom selection.
We consider two types of selection rules: (i) the selection variable is binary; and (ii) the selection variable is a corner solution or
In the binary selection case, our approach has similarities to the methodology of Meng and Schmidt (1985),
who consider cross-section binary response models. To account for possible correlation between unobserved heterogeneity and
In most applications, the selection variable is a corner solution, where some segment of the population chooses zero. Good examples are hours worked and
quantity purchased of a good. In some cases, the variable is truly censored, especially when observability of y depends on whether an event occurs before a
certain duration. If the duration is censored then the selection variable is properly viewed as censored. The statistical framework is essentially the same. For
brevity, we refer to this case as the censored case.
J Appl Econ. 2018;33:179–197. wileyonlinelibrary.com/journal/jae Copyright © 2017 John Wiley & Sons, Ltd. 179