Access the full text.
Sign up today, get DeepDyve free for 14 days.
The increasing popularity of XML on the internet has brought about a number of research problems regarding methods of data management, indexing, and retrieval in large repositories. XML clustering is used to decrease the size of large collections of XML documents in a repository to facilitate retrieval operations. Most of clustering approaches focus on improving performance by using structure summary but at the cost of accuracy. A major drawback of summarisation techniques is the loss of XML documents’ characteristics. The main objective of this work is improving the accuracy of XML document clustering specifically in the case of homogeneous datasets while preserving performance. Towards this end, in this work we propose a new XML document structure and present an enhanced matching procedure to calculate the similarity between XML documents. The proposed method is implemented and evaluated using homogeneous and heterogeneous datasets. The experimental results show a significant improvement in clustering accuracy, especially in homogeneous XML documents without a significant impact on processing time.
International Journal of Web Engineering and Technology – Inderscience Publishers
Published: Jan 1, 2014
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.