Purpose – The growth of the web and the increasing number of documents electronically available has been paralleled by the emergence of harmful web pages content such as pornography, violence, racism, etc. This emergence involved the necessity of providing filtering systems designed to secure the internet access. Most of them process mainly the adult content and focus on blocking pornography, marginalizing violence. The purpose of this paper is to propose a violent web content detection and filtering system, which uses textual and structural content‐based analysis. Design/methodology/approach – The violent web content detection and filtering system uses textual and structural content‐based analysis based on a violent keyword dictionary. The paper focuses on the keyword dictionary preparation, and presents a comparative study of different data mining techniques to block violent content web pages. Findings – The solution presented in this paper showed its effectiveness by scoring a 89 per cent classification accuracy rate on its test data set. Research limitations/implications – Many future work directions can be considered. This paper analyzed only the web page, and an additional analysis of the visual content can be one of the directions of future work. Future research is underway to develop effective filtering tools for other types of harmful web pages, such as racist, etc. Originality/value – The paper's major contributions are first, the study and comparison of several decision tree building algorithms to build a violent web classifier based on a textual and structural content‐based analysis for improving web filtering. Second, showing laborious dictionary building by finding automatically discriminative indicative keywords.
International Journal of Web Information Systems – Emerald Publishing
Published: Nov 21, 2008
Keywords: Worldwide web; Classification schemes; Data analysis; Content management