Access the full text.
Sign up today, get DeepDyve free for 14 days.
Tom Fawcett (2003)
"In vivo" spam filtering: a challenge problem for KDDSIGKDD Explor., 5
(2004)
The many faces of roc analysis in machine learning
Tom Fawcett (2004)
"In vivo" spam filtering: A challenge problem for data miningArXiv, cs.AI/0405007
J. Lenhard (2006)
Models and Statistical Inference: The Controversy between Fisher and Neyman–PearsonThe British Journal for the Philosophy of Science, 57
Ion Androutsopoulos, G. Paliouras, E. Michelakis, E. Michelakis (2006)
Learning to Filter Unsolicited Commercial E-Mail
S. Holm (1979)
A Simple Sequentially Rejective Multiple Test ProcedureScandinavian Journal of Statistics, 6
R. Fisher (1925)
Theory of Statistical EstimationMathematical Proceedings of the Cambridge Philosophical Society, 22
S. Park, J. Goo, Chan-Hee Jo (2004)
Receiver Operating Characteristic (ROC) Curve: Practical Review for RadiologistsKorean Journal of Radiology, 5
(2004)
The DSPAM project. http://www.nuclearelephant.com/projects/dspam
Alvin Martin, G. Doddington, T. Kamm, M. Ordowski, Mark Przybocki (1997)
The DET curve in assessment of detection task performance
CRM 114 — the controllable regex mutilator The DSPAM project
(2004)
Spambayes: Bayesian anti-spam classifier in python
G. Cormack (2006)
TREC 2006 Spam Track Overview
Lynam
(2004)
The spamassassin public mail corpus
M. Sahami, S. Dumais, D. Heckerman, E. Horvitz (1998)
A Bayesian Approach to Filtering Junk E-Mail
A. Agresti (1990)
An introduction to categorical data analysis
Andrew Tuttle, E. Milios, N. Kalyaniwalla (2004)
An Evaluation of Machine Learning Techniques for Enterprise Spam Filters
(2006)
Better Bayesian filtering
Ron Kohavi (1995)
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
(2007)
ACM Transactions on Information Systems
(2005)
TREC Spam Filter Evaluation Tool Kit
F. Sebastiani (2001)
Machine learning in automated text categorizationArXiv, cs.IR/0110053
Ion Androutsopoulos, J. Koutsias, K. Chandrinos, G. Paliouras, C. Spyropoulos (2000)
An evaluation of Naive Bayesian anti-spam filteringArXiv, cs.CL/0006013
For review only Please cite http://plg.uwaterloo.ca/˜gvcormac/spamcormack.html, November 3, 2006 On-line Supervised Spam Filter Evaluation
Georgios Sakkis, Ion Androutsopoulos, G. Paliouras, V. Karkaletsis, C. Spyropoulos, Panagiotis Stamatopoulos (2001)
Stacking Classifiers for Anti-Spam Filtering of E-MailArXiv, cs.CL/0106040
J. Alspector (2001)
SVM-based Filtering of E-mail Spam with Content-specic Misclassication Costs
(2005)
The apache spamassassin project
H. Drucker, Donghui Wu, V. Vapnik (1999)
Support vector machines for spam categorizationIEEE transactions on neural networks, 10 5
G. Cormack, Andrej Bratko (2006)
Batch and Online Spam Filter Comparison
T. Meyer, Brendon Whateley (2004)
SpamBayes: Effective open-source, Bayesian based, email classification system
(2002)
Spamprobe—A fast Bayesian spam filter
R. Pampapathi, B. Mirkin, M. Levene (2005)
A Suffix Tree Approach to Email FilteringArXiv, abs/cs/0503030
D. Lewis (1995)
Evaluating and optimizing autonomous text classification systems
M. Graffar (1971)
[Modern epidemiology].Bruxelles medical, 51 10
Gary Robinson (2003)
A statistical approach to the spam problemLinux Journal, 2003
(2004)
Bogofilter
(2005)
TREC 2005 Spam Track Overview. In Fourteenth Text REtrieval Conference (TREC-2005). NIST
Le Zhang, Jingbo Zhu, T. Yao (2004)
An evaluation of statistical spam filtering techniquesACM Trans. Asian Lang. Inf. Process., 3
(2007)
Article 11, Publication date
G. Cormack, T. Lynam (2005)
Spam Corpus Creation for TREC
F. Provost, Tom Fawcett, Ron Kohavi (1998)
The Case against Accuracy Estimation for Comparing Induction Algorithms
(2004)
Gary Robinson’s spam rants
K. Spackman (1989)
Signal Detection Theory: Valuable Tools for Evaluating Inductive Learning
J. Hidalgo (2002)
Evaluating cost-sensitive Unsolicited Bulk Email categorization
(2006)
Does Bayesian poising exist? Virus Bulletin
(2002)
A plan for spam
(2004)
The spam-filtering plateau
(1986)
PDQ Statistics
Ion Androutsopoulos, G. Paliouras, V. Karkaletsis, Georgios Sakkis, C. Spyropoulos, Panagiotis Stamatopoulos (2000)
Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based ApproachArXiv, cs.CL/0009009
Eleven variants of six widely used open-source spam filters are tested on a chronological sequence of 49086 e-mail messages received by an individual from August 2003 through March 2004. Our approach differs from those previously reported in that the test set is large, comprises uncensored raw messages, and is presented to each filter sequentially with incremental feedback. Misclassification rates and Receiver Operating Characteristic Curve measurements are reported, with statistical confidence intervals. Quantitative results indicate that content-based filters can eliminate 98% of spam while incurring 0.1% legitimate email loss. Qualitative results indicate that the risk of loss depends on the nature of the message, and that messages likely to be lost may be those that are less critical. More generally, our methodology has been encapsulated in a free software toolkit, which may used to conduct similar experiments.
ACM Transactions on Information Systems (TOIS) – Association for Computing Machinery
Published: Jul 1, 2007
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.