Efficient query processing for XML keyword queries based on the IDList index

Junfeng Zhou; Zhifeng Bao; Wei Wang; Jinjia Zhao; Xiaofeng Meng

doi:10.1007/s00778-013-0313-2

Zhou, Junfeng; Bao, Zhifeng; Wang, Wei; Zhao, Jinjia; Meng, Xiaofeng

2013-05-01 00:00:00

Keyword search over XML data has attracted a lot of research efforts in the last decade, where one of the fundamental research problems is how to efficiently answer a given keyword query w.r.t. a certain query semantics. We found that the key factor resulting in the inefficiency for existing methods is that they all heavily suffer from the common-ancestor-repetition problem. In this paper, we propose a novel form of inverted list, namely the IDList; the IDList for keyword $$k$$ consists of ordered nodes that directly or indirectly contain $$k$$ . We then show that finding keyword query results based on the smallest lowest common ancestor and exclusive lowest common ancestor semantics can be reduced to ordered set intersection problem, which has been heavily optimized due to its application in areas such as information retrieval and database systems. We propose several algorithms that exploit set intersection in different directions and with or without using additional indexes. We further propose several algorithms that are based on hash search to simplify the operation of finding common nodes from all involved IDLists. We have conducted an extensive set of experiments using many state-of-the-art algorithms and several large-scale datasets. The results demonstrate that our proposed methods outperform existing methods by up to two orders of magnitude in many cases.

http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

The VLDB Journal Springer Journals

http://www.deepdyve.com/lp/springer-journals/efficient-query-processing-for-xml-keyword-queries-based-on-the-idlist-MsVn55nyWK

Efficient query processing for XML keyword queries based on the IDList index

Loading next page...

References (6)

D. Tsirogiannis, S. Guha, N. Koudas (2009)
Improving the performance of list intersection
PVLDB, 2
J.L. Bentley, A.C.-C. Yao (1976)
An almost optimal algorithm for unbounded searching
Inf. Process. Lett., 5
B Ding, AC König (2011)
Fast set intersection in memory
PVLDB, 4
Z Liu, Y Chen (2008)
Reasoning and identifying relevant matches for xml keyword search
PVLDB, 1
Z Liu, Y Chen (2011)
Processing keyword search on xml: a survey
World Wide Web, 14
CD Manning, P Raghavan, H Schutze (2008)
Introduction to Information Retrieval

Publisher: Springer Journals
Subject: Computer Science; Database Management
ISSN: 1066-8888
eISSN: 0949-877X
DOI: 10.1007/s00778-013-0313-2
Publisher site: See Article on Publisher Site

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Efficient query processing for XML keyword queries based on the IDList index

Efficient query processing for XML keyword queries based on the IDList index

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Efficient query processing for XML keyword queries based on the IDList index

Efficient query processing for XML keyword queries based on the IDList index

References (6)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies