In Proceedings of the 23rd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data co-located with EDBT/ICDT 2021 Joint Conference, DOLAP@EDBT/ICDT 2021, Nicosia, Cyprus
Abstract. In-situ processing has received a great deal of attention in recent years. In in-situ scenarios, big raw data files which do not fit in main memory, must be efficiently handled using commodity hardware, without the overhead of a preprocessing phase or the loading of data into a database. In this work, we present an adaptive indexing scheme that enables efficient visual exploration and analytics over big raw data files. Beyond visual exploration and statistics, the scheme enables categorical-based analytics using group-by and filter operations. The proposed scheme combines a tile-based structure that offers efficient exploratory operations over the 2D space, with a tree-based structure that organizes a tile’s objects based on their categorical values, enabling efficient visual analytics and the support of advanced visualization methods. The index resides in main memory and is built progressively as the user explores parts of the raw file, whereas its structure and level of granularity are adjusted to the user’s exploration areas and type of analysis. We conduct experiments using real and synthetic datasets, and demonstrate that the proposed approach, is in most cases more than 40× faster compared to the existing solutions, and performs around 3 orders of magnitude less I/O operations.