Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

EXCLS: enhanced XML clustering by level structure accuracy

EXCLS: enhanced XML clustering by level structure accuracy The increasing popularity of XML on the internet has brought about a number of research problems regarding methods of data management, indexing, and retrieval in large repositories. XML clustering is used to decrease the size of large collections of XML documents in a repository to facilitate retrieval operations. Most of clustering approaches focus on improving performance by using structure summary but at the cost of accuracy. A major drawback of summarisation techniques is the loss of XML documents’ characteristics. The main objective of this work is improving the accuracy of XML document clustering specifically in the case of homogeneous datasets while preserving performance. Towards this end, in this work we propose a new XML document structure and present an enhanced matching procedure to calculate the similarity between XML documents. The proposed method is implemented and evaluated using homogeneous and heterogeneous datasets. The experimental results show a significant improvement in clustering accuracy, especially in homogeneous XML documents without a significant impact on processing time. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Engineering and Technology Inderscience Publishers

EXCLS: enhanced XML clustering by level structure accuracy

Loading next page...
 
/lp/inderscience-publishers/excls-enhanced-xml-clustering-by-level-structure-accuracy-ReQ5Z3XtkP
Publisher
Inderscience Publishers
Copyright
Copyright © Inderscience Enterprises Ltd. All rights reserved
ISSN
1476-1289
eISSN
1741-9212
DOI
10.1504/IJWET.2014.067539
Publisher site
See Article on Publisher Site

Abstract

The increasing popularity of XML on the internet has brought about a number of research problems regarding methods of data management, indexing, and retrieval in large repositories. XML clustering is used to decrease the size of large collections of XML documents in a repository to facilitate retrieval operations. Most of clustering approaches focus on improving performance by using structure summary but at the cost of accuracy. A major drawback of summarisation techniques is the loss of XML documents’ characteristics. The main objective of this work is improving the accuracy of XML document clustering specifically in the case of homogeneous datasets while preserving performance. Towards this end, in this work we propose a new XML document structure and present an enhanced matching procedure to calculate the similarity between XML documents. The proposed method is implemented and evaluated using homogeneous and heterogeneous datasets. The experimental results show a significant improvement in clustering accuracy, especially in homogeneous XML documents without a significant impact on processing time.

Journal

International Journal of Web Engineering and TechnologyInderscience Publishers

Published: Jan 1, 2014

There are no references for this article.