Department of Big Data Research Infrastructures

The department conducts research, technological and innovation challenges in the field of systems and infrastructures for the organization, storage, curation, and management of large data volumes to support a variety of important application domains. In this context, key activities involve efficiency and scalability issues for digital research infrastructures (RIs), including techniques and systems for complex information flow processing tailored to heterogeneous computation and data storage environments. Also, IMSI has a strong focus on producing reliable, high-quality digital assets, facilitating their archiving and long-term maintenance, and uncovering their added value via knowledge extraction tasks. In this context, our work demonstrates a strong interdisciplinary aspect, providing solutions for the effective exploitation of Big Data technologies in scientific areas and having a leading role in existing European and National RIs for several scientific domains. We carry out R&I activities to build scalable data infrastructures, either tailor-made for specific scientific domains (e.g., Health, Humanities) or generic enough for any RI (Generic Data Infrastructures, Open Science).

The recent activities of our group are rather balanced between research and development of innovative applications, and are outlined below.

Generic Data infrastructures. HELIX is the only horizontal e-Infrastructure of the Greek National Roadmap for Research Infrastructures, with is now in full operation as a sustainable infrastructure for data-intensive research and innovation, providing its services to scientists, researchers, and the industry at large. HELIX materializes the vision of our national Data Economy, promoting scientific advancement and economic growth. HELIX is founded in a triple helix of scientific practices, tightly interwoven to produce knowledge and innovation: Data, Publications, Digital Laboratories. HELIX allows researchers to tap into ready-to-use Big Data frameworks and data collections, provides Jupyter as a hosted service, with streamlined access to published data, and over highly scalable execution environments. Data Management Plans are essential for researchers, and HELIX helps implement them. The HELIX infrastructure is available at https://data.hellenicdataservice.gr.

Open science. OpenAIRE is a member based, non-profit organisation hosted in Athena RC. With 47 members in key European universities and national infrastructures, OpenAIRE’s operations provide the glue for many of the EOSC user and research driven functionalities. OpenAIRE’s technical infrastructure is comprised of: the Services Layer (all services in institutional, national, European/RIs settings), the Content Interoperability Layer (based on a compliance framework that applies the rules of how the data elements are published and re-used, and implements the Open Research Graph), the Access Interoperability Layer (assures a uniform delivery of services and data for EOSC) and the Monitoring Layer. The OpenAIRE Research Graph (graph.openaire.eu) is a linked open dataset of the Europe’s research outcomes, automatically curated via AI enabled processes, presenting operational scalability challenges. It is a key asset in EOSC (and globally) and a monitoring tool for Open Science uptake. The OpenAIRE data anonymization tool Amnesia (https://amnesia.openaire.eu), an Athena RC product hosted and promoted by OpenAIRE, is a flexible data anonymization tool that transforms relational and transactional databases to dataset where formal privacy guarantees hold. BIP! Toolbox (https://bip.imsi.athenarc.gr/) is a set of services and resources that leverage the OpenAIRE Research Graph to facilitate research impact assessment for research outputs and individual researchers. The OpenAIRE data anonymization tool Amnesia (https://amnesia.openaire.eu) is a flexible data anonymization tool that transforms relational and transactional databases to datasets where formal privacy guarantees hold (k-anonymity and km-anonymity).

Health. We have designed and developed service catalogs, tools and data infrastructures to support data-intensive biomedical workflows and research activities, serving a range of domains from genomics and structural biology to medicine. In ELIXIR-GR, a major outcome is Hypatia (https://hypatia.athenarc.gr), a cloud infrastructure to support the computational needs and provide a broad spectrum of bioinformatic services, not only for ELIXIR-GR RI, but also the broader life science community. Hypatia hosts important ELIXIR-GR workflows, tools and biological databases, among them, e.g., the national COVID19 Data Portal of Greece (https://covid19dataportal.gr/). For Inspired-Ris, the RI in the field of Structural Biology that combines studies on bioactive (macro)molecules interactions and biomarkers identification, we have developed the core catalog of services in the field of biology, diagnostics and pharmacology (https://inspired-ris.catalogue.athenarc.gr). Finally, for the Hellenig Precision Medicine Network on Cancer (https://oncopmnet.gr), we have designed and developed Vast, a platform, tailor-made for the Network, to support large-scale, collaborative knowledge-driven gene variant annotation and interpretation for clinical oncology.

Humanities and Digital Curation. At the national level, the Department leads the APOLLONIS Greek Infrastructure for Digital Arts, Humanities and Language Research and Innovation (P. Constantopoulos, coordinator) resulting from the unification of CLARIN-EL and DARIAH-GR, reaching broader communities with interoperable services. At the european level, the Department actively participates in the European Digital Research Infrastructure for the Arts and Humanities (DARIAH) since the preparatory phase, currently with leading roles in VCC2, DARIAH’s Virtual Competence Centre for Research and Education. Also, it has been heavily involved in building the ARIADNE infrastructure for archaeology, also providing advanced aggregation services for the Europeana ecosystem and contributing to the Research Data Alliance (RDA). It has developed the ESF NeDiMAH methods ontology (NeMO), now used to drive the automatic extraction of research processes. More recently, work in process modeling and service interoperability is undertaken for designing the Competence Centre for the Conservation of Cultural Heritage in the 4CH project.