Department of Big Data Analytics and Machine Learning

The department conducts research, technological development and innovation in the fields of large-scale algorithms and systems for management, processing and analysis of large and heterogeneous volumes of static and dynamic data. In this context, key research directions include efficient and interactive analysis of Big Data for different application domains, such as Complex Event Processing, the extraction and dynamic update of complex Machine Learning models, and the development of Predictive Analytics models. Issues are also covered including analysis of continuous data streams, platforms and tools for scalable data analytics, algorithms and systems for large-scale supervised and unsupervised Machine Learning, as well as Privacy-Preserving Data Mining.

The department works towards scaling up Data Science and Machine Learning solutions to work on Big Data, extending to new forms of hardware and communications services, and investigating multi-disciplinary approaches across different sectors at larger scale than today. In particular, the department investigates the following directions:

  • Big Data and Data Science. Data Science receives great attention due to the availability of huge volumes of data and the opportunity to mine and extract useful knowledge. The department leverages its significant expertise on Big Data management to solve practical and challenging problems related to Data Science on Big Data. In particular, we investigate how statistical methods of Data Science can be combined with the Big Data need for scaling, particularly for non-conventional data types, like temporal, spatial, graph, stream, and scientific.

  • Data Management for Machine Learning. Machine Learning has made great advances, particularly on non-conventional data modalities relating to language and vision such as text, images, audio and video, which takes up the greatest volume of data being generated. Deep Learning gives the opportunity to not only solve traditional tasks such as making discrete or continuous predictions, but learning compact, powerful representations for the data itself. Such representations can be used for efficient storage and addressing new tasks, such as retrieval, clustering and exploration of data. Despite progress in self-supervised learning in the scale of billions, it remains largely unexplored how to apply such methods on Big Data. We address such challenges by using compact, hierarchical representations of the data and large-scale indexing depending on the modality. One example is continual learning on data streams, where a possible direction is to go beyond the standard class-incremental setting towards instance-incremental, self-supervised learning.

  • Machine Learning for Data Management. With the advent of Machine Learning technology and the recent, rapid advances in this area, a new trend has emerged in data management research that explores potential opportunities of enriching the currently programmed heuristics with learned ones, uses dynamic, self-learning approaches to cost modeling, and enriches traditional plan generation strategies using learning to learn from the outcomes of previous planning instances and dramatically reduce search time for future planning. Our work in this area investigates potential applications of Machine Learning to complement or even replace critical data engine components, such as the optimizer or the workload manager. We will also investigate opportunities to extend the technology to cross-platform environments.

  • Application domains across sectors. Data Science is already penetrating all sectors of financial and social activity. Application domains across different sectors may give opportunities for multi-disciplinary research, addressing problems at even larger scale than today. One example is the circular economy: Contrary to the 'take, make, dispose' production model of a linear economy, here resource input, waste, emission, and energy leakage are minimized by narrowing material and energy loops through recycling, reuse, remanufacturing, repair, etc. Such model extends to different sectors including energy, manufacturing, transportation, finance, and the environment. Our aim is to build on state-of-the-art solutions in data science and big data analytics, and provide novel methods to collect, process and analyze data to support such application domains.