Purpose – Estimating the sizes of query results and intermediate results is crucial to many aspects of query processing. All database systems rely on the use of cardinality estimates to choose the cheapest execution plan. In principle, the problem of cardinality estimation is more complicated in the Extensible Markup Language (XML) domain than the relational domain. The purpose of this paper is to present a novel framework for estimating the cardinality of XQuery expressions as well as their sub‐expressions. Additionally, this paper proposes a novel XQuery cardinality estimation benchmark. The main aim of this benchmark is to establish the basis of comparison between the different estimation approaches in the XQuery domain. Design/methodology/approach – As a major innovation, the paper exploits the relational algebraic infrastructure to provide accurate estimation in the context of XML and XQuery domains. In the proposed framework, XQuery expressions are translated into an equivalent relational algebraic plans and then using a well defined set of inference rules and a set of special properties of the algebraic plan, this framework is able to provide high‐accurate estimation for XQuery expressions. Findings – This paper is believed to be the first which provides a uniform framework to estimate the cardinality of more powerful XML querying capabilities using XQuery expressions as well as their sub‐expressions. It exploits the relational algebraic infrastructure to provide accurate estimation in the context of XML and XQuery domains. Moreover, the proposed framework can act as a meta‐model through its ability to incorporate different summarized XML structures and different histogram techniques which allows the model designers to achieve their targets by focusing their effort on designing or selecting the adequate techniques for them. In addition, this paper proposes benchmark for XQuery cardinality estimation systems. The proposed benchmark distinguishes itself from the other existing XML benchmarks in its focus on establishing the basis for comparing the different estimation approaches in the XML domain in terms of their accuracy of the estimations and their completeness in handling different XML querying features. Research limitations/implications – The current status of this proposed XQuery cardinality estimations framework does not support the estimation of the queries over the order information of the source XML documents and does not support non‐numeric predicates. Practical implications – The experiments of this XQuery cardinality estimation system demonstrate its effectiveness and show high‐accurate estimation results. Utilizing the cardinality estimation properties during the SQL translation of XQuery expression results in an average improvement of 20 percent on the performance of their execution times. Originality/value – This paper presents a novel framework for estimating the cardinality of XQuery expressions as well as its sub‐expressions. A novel XQuery cardinality estimation benchmark is introduced to establish the basis of comparison between the different estimation approaches in the XQuery domain.
International Journal of Web Information Systems – Emerald Publishing
Published: Apr 4, 2008
Keywords: Benchmarking; Algebra; Database management; Extensible Markup Language