A Supervised Skyline-Based Algorithm for Spatial Entity Linkage
Published in Proceedings of the 25th International Conference on Extending Database Technology (EDBT), 29th March-1st April, 2022, ISBN 978-3-89318-085-7
2022
Conference/Workshop
- Contact persons: Giorgos Giannopoulos , Vassilis Kaffes
- Relevant research project: LinkGeoML
Abstract.
The ease of publishing data on the web has contributed to larger
and more diverse types of data. Entities that refer to a physical
place and are characterized by a location and different attributes
are named spatial entities. Even though the amount of spatial entity data from multiple sources keeps increasing, facilitating the
development of richer, more accurate and more comprehensive
geospatial applications and services, there is unavoidable redundancy and ambiguity. We address the problem of spatial entity
linkage with SkylineExplore-Trained (SkyEx-T), a skyline-based
algorithm that can label an entity pair as being the same physical
entity or not. We introduce LinkGeoML-eXtended (LGM-X), a
meta-similarity function that computes similarity features specifically tailored to the specificities of spatial entities. The skylines
of SkyEx-T are created using a preference function, which ranks
the pairs based on the likelihood of referring to the same entity.
We propose deriving the preference function using a tiny training
set (down to 0.05% of the dataset). Additionally, we provide a
theoretical guarantee for the cut-off that can best separate the
classes, and we show experimentally that it results in a nearoptimal F-measure (on average only 2% loss). SkyEx-T yields an
F-measure of 0.71-0.74 and beats the existing non-skyline-based
baselines with a margin of 0.11-0.39 in F-measure. When compared to machine learning techniques, SkyEx-T is able to achieve
a similar accuracy (sometimes slightly better one in very small
training sets) and more importantly, having no-parameters to
tune and a model that is already explainable (no need for further
actions to achieve explainability)