Sustainable Data Lakes for Extreme-Scale Analytics
R&D Project - European

In the era of Big Data, decision making processes are becoming increasingly data-driven and data-intensive. The Data Lake approach refers to assembling large amounts of diverse data from a multitude of data sources, retaining their original model and format, and allowing users to query and analyze them in situ. Thus, it promises to enable ad hoc, self-service analytics and to reduce the required time from data to insights.

SmartDataLake aims at designing, developing and evaluating novel approaches, techniques and tools for extreme-scale analytics over Big Data Lakes. It tackles the challenges of reducing costs and extracting value from Big Data Lakes by providing solutions for virtualized and adaptive data access; automated and adaptive data storage tiering; smart data discovery, exploration and mining; monitoring and assessing the impact of changes; and empowering the data scientist in the loop through scalable and interactive data visualizations.

The results of the project are evaluated in real-world use cases from the Business Intelligence domain, including scenarios for portfolio recommendation, production planning and pricing, and investment decision making.