Ann. Data. Sci.
Mining and Classifying Images from an Advertisement
Received: 13 July 2017 / Revised: 22 March 2018 / Accepted: 30 April 2018
© Springer-Verlag GmbH Germany, part of Springer Nature 2018
Abstract AdEater is an early browsing assistant that automatically removes adver-
tisement images from internet pages. It works by generating rules from training data
and implementing these rules when browsing the internet. Advertisement images on
web pages are replaced by transparent images that display on the image the word “ad”,
and where images are misclassiﬁed, non-advertisement images on a webpage will also
be replaced by transparent images displaying “ad”. This paper critically examines the
dataset derived from a trial of AdEater and tries to build a robust image classiﬁer. We
apply data mining techniques to uncover associations between features of advertise-
ments and non-advertisements and try to predict whether the images are advertisements
or non-advertisements based on three classiﬁcation methods. We achieve classiﬁcation
accuracy of 96.5%, using k-fold cross validation to train and test the model.
Keywords AdEater · Classiﬁcation trees · Machine learning · Data mining · Artiﬁcial
intelligence · Support vector machine · k-means clustering · Silhouette · Association
Electronic supplementary material The online version of this article (https://doi.org/10.1007/s40745-
018-0164-1) contains supplementary material, which is available to authorized users.
University College Dublin, Belﬁeld, Dublin 4, Ireland