Automatic Clustering by Detecting Significant Density Dips in Multiple Dimensions Full text

Pantelis Chronis, Spiros Athanasiou, Spiros Skiadopoulos
19th IEEE International Conference on Data Mining (ICDM 2019)
Περίληψη. Clustering algorithms are used to find groups of similar items in a dataset. Automatic clustering algorithms achieve this task without requiring users to input critical parameters. A recent automatic clustering methodology uses Hartigan's dip test to detect significant peaks in the distribution of a dataset. This test can detect peaks in the distribution of a one-dimensional variable. To perform clustering in multiple dimensions, algorithms of this methodology rely on one-dimensional transformations of the dataset, which limits their effectiveness. In this paper, we present M-Dip, an automatic clustering algorithm that works directly on multi-dimensional space. M-Dip also assumes that clusters correspond to different peaks in the distribution of the dataset. It separates clusters at the dips that form between neighboring peaks. Dips are detected directly in multi-dimensional space, using a graph-based method. Their statistical significance is evaluated through appropriate simulations. Our experimental evaluation indicates that M-Dip achieves significantly better results than existing algorithms based on Hartigan's dip, as well as other state-of-the-art automatic clustering algorithms.