VLDB J. (2022)
Abstract. In in situ data management scenarios, large data files, which do not fit in main memory, must be efficiently handled using commodity hardware, without the overhead of a preprocessing phase or the loading of data into a database. In this work, we study the challenges posed by the visual analysis tasks in in situ scenarios in the presence of memory constraints. We present an indexing scheme and adaptive query evaluation techniques, which enable efficient categorical-based group-by and filter operations, combined with 2D visual interactions, such as exploration of data points on maps or scatter plots. The indexing scheme combines a tile-based structure, which offers efficient visual exploration over the 2D plane, with a tree-based structure, which organizes a tile’s objects based on its categorical values. The index is constructed on-the-fly, resides in main memory, and is built progressively as the user explores parts of the raw file, whereas its structure and level of granularity are adjusted to the user’s exploration areas and type of analysis. To handle the cases where limited resources are available, we introduce a resource-aware index initialization mechanism, we formulate it as an NP-hard optimization problem and we propose two efficient approximation algorithms to solve it. We conduct extensive experiments using real and synthetic datasets and demonstrate that our approach reports interactive query response times (less than 0.04sec) and in most cases is more than 100× faster and performs up to two orders of magnitude less I/O operations compared to existing solutions. The proposed methods are implemented as part of an open-source system for in situ visual exploration and analytics.