Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You and Your Team.

Learn More →

Using visual content‐based analysis with textual and structural analysis for improving web filtering

Using visual content‐based analysis with textual and structural analysis for improving web filtering Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable web content. In this paper, we investigate this problem through WebGuard, our automatic machine learning based pornographic website classification and filtering system. Facing the Internet more and more visual and multimedia as exemplified by pornographic websites, we focus here our attention on the use of skin color related visual content based analysis along with textual and structural content based analysis for improving pornographic website filtering. While the most commercial filtering products on the marketplace are mainly based on textual content‐based analysis such as indicative keywords detection or manually collected black list checking, the originality of our work resides on the addition of structural and visual content‐based analysis to the classical textual content‐based analysis along with several major‐data mining techniques for learning and classifying. Experimented on a testbed of 400 websites including 200 adult sites and 200 non pornographic ones, WebGuard, our Web filtering engine scored a 96.1% classification accuracy rate when only textual and structural content based analysis are used, and 97.4% classification accuracy rate when skin color related visual content based analysis is driven in addition. Further experiments on a black list of 12 311 adult websites manually collected and classified by the French Ministry of Education showed that WebGuard scored 87.82% classification accuracy rate when using only textual and structural content‐based analysis, and 95.62% classification accuracy rate when the visual content‐based analysis is driven in addition. The basic framework of WebGuard can apply to other categorization problems of websites which combine, as most of them do today, textual and visual content. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Information Systems Emerald Publishing

Using visual content‐based analysis with textual and structural analysis for improving web filtering

Loading next page...
 
/lp/emerald-publishing/using-visual-content-based-analysis-with-textual-and-structural-4XkpvwHgGF
Publisher
Emerald Publishing
Copyright
Copyright © 2005 Emerald Group Publishing Limited. All rights reserved.
ISSN
1744-0084
DOI
10.1108/17440080580000096
Publisher site
See Article on Publisher Site

Abstract

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable web content. In this paper, we investigate this problem through WebGuard, our automatic machine learning based pornographic website classification and filtering system. Facing the Internet more and more visual and multimedia as exemplified by pornographic websites, we focus here our attention on the use of skin color related visual content based analysis along with textual and structural content based analysis for improving pornographic website filtering. While the most commercial filtering products on the marketplace are mainly based on textual content‐based analysis such as indicative keywords detection or manually collected black list checking, the originality of our work resides on the addition of structural and visual content‐based analysis to the classical textual content‐based analysis along with several major‐data mining techniques for learning and classifying. Experimented on a testbed of 400 websites including 200 adult sites and 200 non pornographic ones, WebGuard, our Web filtering engine scored a 96.1% classification accuracy rate when only textual and structural content based analysis are used, and 97.4% classification accuracy rate when skin color related visual content based analysis is driven in addition. Further experiments on a black list of 12 311 adult websites manually collected and classified by the French Ministry of Education showed that WebGuard scored 87.82% classification accuracy rate when using only textual and structural content‐based analysis, and 95.62% classification accuracy rate when the visual content‐based analysis is driven in addition. The basic framework of WebGuard can apply to other categorization problems of websites which combine, as most of them do today, textual and visual content.

Journal

International Journal of Web Information SystemsEmerald Publishing

Published: Nov 1, 2005

Keywords: Web classification and categorization; Datamining; Web textual and structural content; Visual content analysis; Skin color model; Pornographic website filtering

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$499/year

Save searches from
Google Scholar,
PubMed

Create folders to
organize your research

Export folders, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month