A Cloud-Native NGS Data Processing and Annotation Platform

Giannis Mouchakis, Babis Kostopoulos, Stasinos Konstantopoulos, Ilias Kanellos, Anargiros Tzerefos, Thanasis Vergoulis, Thodoris Dalamagas
Poly/DMAH@VLDB 2021: 121-132
Abstract. Low-cost and widely available Next-Generation Sequencing (NGS) is revolutionizing clinical practice, paving the way for the realization of precision medicine. Applying NGS to clinical practice requires establishing a complex loop involving sample collection and sequencing, computational processing of the NGS outputs to identify variants, and the interpretation of the variants to establish their significance for the condition being treated. The computational tools that perform variant calling have been extensively used in bioinformatics, but there are few attempts to integrate them in a comprehensive, production-grade, Cloud-native infrastructure able to scale to national levels. Furthermore, there are no established interfaces for closing the loop between NGS machines, computational infrastructure, and variant interpretation experts. We present here the platform developed for the Greek National Precision Medicine Network for Oncology. The platform integrates bioinformatics tools and their orchestration, makes provisions for both experimental and clinical usage of variant calling pipelines, provides programmatic interfaces for integration with NGS machines and for analytics, and provides user interfaces for supporting variant interpretation. We also present benchmarking results and discuss how these results confirm the soundness of our architectural and implementation choices.