Access the full text.
Sign up today, get DeepDyve free for 14 days.
C. Elkan (2001)
The Foundations of Cost-Sensitive Learning
Amir Atiya (2005)
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and BeyondIEEE Transactions on Neural Networks, 16
B. Zadrozny, J. Langford, N. Abe (2003)
Cost-sensitive learning by cost-proportionate example weightingThird IEEE International Conference on Data Mining
J. Heckman (1979)
Sample selection bias as a specification errorApplied Econometrics, 31
D. Hand, H. Mannila, Padhraic Smyth (2001)
Principles of Data MiningDrug Safety, 30
T. Joachims (2000)
Estimating the Generalization Performance of an SVM Efficiently
S. Morishita (1998)
On Classification and Regression
B. Scholkopf, C. Burges, Alex Smola (1999)
Advances in kernel methods: support vector learning
(2000)
UCI KDD archive
J. Quinlan (1992)
C4.5: Programs for Machine Learning
S. Parsons (2004)
Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-XThe Knowledge Engineering Review, 19
A. Ng, Michael Jordan (2001)
On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
T. Joachims (1998)
Making large scale SVM learning practicalTechnical reports
Roderick Little, Donald Rubin (1988)
Statistical Analysis with Missing Data
C. Bishop (1995)
Neural networks for pattern recognition
M. Szummer, T. Jaakkola (2002)
Information Regularization with Partially Labeled Data
Catherine Blake (1998)
UCI Repository of machine learning databases
C. Bishop (1997)
Classification and regression
Learning and Evaluating Classi ers under Sample Selection Bias Bianca Zadrozny IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 zadrozny@us.ibm.com Abstract Classi er learning methods commonly assume that the training data consist of randomly drawn examples from the same distribution as the test examples about which the learned model is expected to make predictions. In many practical situations, however, this assumption is violated, in a problem known in econometrics as sample selection bias. In this paper, we formalize the sample selection bias problem in machine learning terms and study analytically and experimentally how a number of well-known classi er learning methods are a ected by it. We also present a bias correction method that is particularly useful for classi er evaluation under sample selection bias. In both cases, even though the available examples are not a random sample from the true underlying distribution of examples, we would like to learn a predictor from the examples that is as accurate as possible for this distribution. Furthermore, we would like to be able to estimate its accuracy for the whole population using the available data. This problem has received a great deal of attention in econometrics, where it is
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.