Efficient Calculation of Empirical P-values for AssociationTesting of Binary Classifications Full text

Kostis Zagganas, Thanasis Vergoulis, Spiros Skiadopoulos, Theodore Dalamagas
SSDBM 2020
Abstract. Investigating whether two different classifications of a population are associated, is an interesting problem in many scientific fields. For this reason, various statistical tests to reveal this type of associations have been developed, with the most popular of them being Fisher’s exact test. However it has lately been shown that in some cases this test fails to produce accurate results. An alternative approach, known as randomization tests, was introduced to alleviate this issue, however, such tests are computationally intensive. In this paper, we introduce two novel indexing approaches that exploit frequently occurring patterns in classifications to avoid performing redundant computations during the analysis. We conduct a comprehensive set of experiments using real datasets and application scenarios to show that our approaches always outperform the state-of-the-art, with one approach being faster by an order of magnitude.