Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Bayesian Regularization and Pruning Using a Laplace Prior

Bayesian Regularization and Pruning Using a Laplace Prior Standard techniques for improved generalization from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy suggests a Laplace rather than a gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error and (2) those failing to achieve this sensitivity and that therefore vanish. Since the critical value is determined adaptively during training, pruning—in the sense of setting weights to exact zeros—becomes an automatic consequence of regularization alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a gaussian regularizer. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Neural Computation MIT Press

Bayesian Regularization and Pruning Using a Laplace Prior

Neural Computation , Volume 7 (1) – Jan 1, 1995

Loading next page...
 
/lp/mit-press/bayesian-regularization-and-pruning-using-a-laplace-prior-7TFdAI5qOp

References (32)

Publisher
MIT Press
Copyright
© 1995 Massachusetts Institute of Technology
ISSN
0899-7667
eISSN
1530-888X
DOI
10.1162/neco.1995.7.1.117
Publisher site
See Article on Publisher Site

Abstract

Standard techniques for improved generalization from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy suggests a Laplace rather than a gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error and (2) those failing to achieve this sensitivity and that therefore vanish. Since the critical value is determined adaptively during training, pruning—in the sense of setting weights to exact zeros—becomes an automatic consequence of regularization alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.

Journal

Neural ComputationMIT Press

Published: Jan 1, 1995

There are no references for this article.