Processing and evaluating partial tree pattern queries on xml data Full text

Xiaoying Wu, Stefanos Souldatos, Dimitri Theodoratos, Theodore Dalamagas, Yannis Vassiliou, and Timos Sellis
IEEE Transactions on Knowledge and Data Engineering (TKDE)
Abstract. In this paper, we consider a class of queries, called Partial Tree-Pattern Queries (PTPQs), which generalizes and strictly contains TPQs. PTPQs represent a broad fragment of XPath. In order to process PTPQs, we introduce a set of sound and complete inference rules to characterize structural relationship derivation. We provide necessary and sufficient conditions for detecting query unsatisfiability and node redundancy. We also show that PTPQs can be represented as directed acyclic graphs augmented with "same-path" constraints. In order to leverage existing efficient evaluation algorithms for less expressive classes of queries, we design two approaches that evaluate a PTPQ by decomposing it into a set of simpler queries. We also develop, $PartialTreeStack$, an original polynomial time holistic algorithm for PTPQs. To the best of our knowledge, this is the first algorithm to support the evaluation of such a broad structural fragment of XPath in the inverted lists evaluation model. We provide a theoretical analysis of our algorithm and identify cases where it is asymptotically optimal. An extensive experimental evaluation shows that it is more efficient, robust and stable than the other two and it outperforms a state-of-the art XQuery engine on PTPQs.