Extended Characteristic Sets: Graph Indexing for SPARQL Query Optimization Full text

Marios Meimaris, George Papastefanatos, Nikos Mamoulis, Ioannis Anagnostopoulos
Data Engineering (ICDE), 2017 IEEE 33rd International Conference on, pp. 497-508. IEEE, 2017
2017
Conference/Workshop
Abstract. SPARQL query execution in state of the art RDF engines depends on, and is often limited by the underlying storage and indexing schemes. Typically, these systems exhaustively store permutations of the standard three-column triples table. However, even though RDF can give birth to datasets with loosely defined schemas, it is common for an emerging structure to appear in the data. In this paper, we introduce a novel indexing scheme for RDF data, that takes advantage of the inherent structure of triples. To this end, we define the Extended Characteristic Set (ECS), a schema abstraction that classifies triples based on the properties of their subjects and objects, and we discuss methods and algorithms for the identification and extraction of ECSs. We show how these can be used to assist query processing, and we implement axonDB, an RDF storage and querying engine based on ECS indexing. We perform an experimental evaluation on real world and synthetic datasets and observe that axonDB outperforms the competition by a few orders of magnitude.