Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Sample-efficient strategies for learning in the presence of noise

Sample-efficient strategies for learning in the presence of noise In this paper, we prove various results about PAC learning in the presence of malicious noise. Our main interest is the sample size behavior of learning algorithms. We prove the first nontrivial sample complexity lower bound in this model by showing that order of ॉ/ख 2 + d /ख (up to logarithmic factors) examples are necessary for PAC learning any target class of {0,1}-valued functions of VC dimension d , where ॉ is the desired accuracy and ॑ = ॉ/(1 + ॉ) - ख the malicious noise rate (it is well known that any nontrivial target class cannot be PAC learned with accuracy ॉ and malicious noise rate ॑ ≥ ॉ/(1 + ॉ), this irrespective to sample complexity). We also show that this result cannot be significantly improved in general by presenting efficient learning algorithms for the class of all subsets of d elements and the class of unions of at most d intervals on the real line. This is especialy interesting as we can also show that the popular minimum disagreement strategy needs samples of size d ॉ/ख 2 , hence is not optimal with respect to sample size. We then discuss the use of randomized hypotheses. For these the bound ॉ/(1 + ॉ) on the noise rate is no longer true and is replaced by 2ॉ/(1 + 2ॉ). In fact, we present a generic algorithm using randomized hypotheses that can tolerate noise rates slightly larger than ॉ/(1 + ॉ) while using samples of size d /ॉ as in the noise-free case. Again one observes a quadratic powerlaw (in this case d ॉ/ख 2 , ख = 2ॉ/(1 + 2ॉ) - ॑) as ख goes to zero. We show upper and lower bounds of this order. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of the ACM (JACM) Association for Computing Machinery

Sample-efficient strategies for learning in the presence of noise

Loading next page...
 
/lp/association-for-computing-machinery/sample-efficient-strategies-for-learning-in-the-presence-of-noise-fE9crJu38u

References (18)

Publisher
Association for Computing Machinery
Copyright
Copyright © 1999 by ACM Inc.
ISSN
0004-5411
DOI
10.1145/324133.324221
Publisher site
See Article on Publisher Site

Abstract

In this paper, we prove various results about PAC learning in the presence of malicious noise. Our main interest is the sample size behavior of learning algorithms. We prove the first nontrivial sample complexity lower bound in this model by showing that order of ॉ/ख 2 + d /ख (up to logarithmic factors) examples are necessary for PAC learning any target class of {0,1}-valued functions of VC dimension d , where ॉ is the desired accuracy and ॑ = ॉ/(1 + ॉ) - ख the malicious noise rate (it is well known that any nontrivial target class cannot be PAC learned with accuracy ॉ and malicious noise rate ॑ ≥ ॉ/(1 + ॉ), this irrespective to sample complexity). We also show that this result cannot be significantly improved in general by presenting efficient learning algorithms for the class of all subsets of d elements and the class of unions of at most d intervals on the real line. This is especialy interesting as we can also show that the popular minimum disagreement strategy needs samples of size d ॉ/ख 2 , hence is not optimal with respect to sample size. We then discuss the use of randomized hypotheses. For these the bound ॉ/(1 + ॉ) on the noise rate is no longer true and is replaced by 2ॉ/(1 + 2ॉ). In fact, we present a generic algorithm using randomized hypotheses that can tolerate noise rates slightly larger than ॉ/(1 + ॉ) while using samples of size d /ॉ as in the noise-free case. Again one observes a quadratic powerlaw (in this case d ॉ/ख 2 , ख = 2ॉ/(1 + 2ॉ) - ॑) as ख goes to zero. We show upper and lower bounds of this order.

Journal

Journal of the ACM (JACM)Association for Computing Machinery

Published: Sep 1, 1999

There are no references for this article.