Simplifying p-value calculation for the unbiased microRNAenrichment analysis, using ML-techniques Full text

Konstantinos Zagganas, Maria Lioli, Thanasis Vergoulis, Theodore Dalamagas
EDBT/ICDT Workshops 2021
Abstract. The investigation of the role of small bio-molecules (called microRNAs) in biological functions is a very popular topic in bioinformatics, since microRNAs have been shown to present novel therapeutic methods for diseases like cancer or Hepatitis C. In order to predict the involvement of microRNAs in biological functions many statistical approaches have been used that involve p-value calculations, with the most popular one being Fisher’s exact test. However, it has been shown that data distribution does not match with any of the theoretical distributions used by the aforementioned approaches. Thus, an empirical randomization approach is preferred. Nevertheless, such analyses are computationally intensive. In this paper, we present a novel approach for microRNA enrichment analysis using Machine Learning techniques, in order to predict p-values instead of calculating them using randomization experiments. This simplifies the work for bioinformatics data analysts, helping them to efficiently perform multiple enrichment analysis tasks.