Learning domain specific models for toponym interlinking Full text

Vassilis Kaffes, Giorgos Giannopoulos, Nikos Karagiannakis, Nontas Tsakonas
Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. November 2019. Pages 504–507
Abstract. Interlinking spatio-textual data comprises a core problem within the research literature, as well as a task of high practical importance in a plethora of industrial applications involving GIS systems. In its general form, it consists in identifying, between two sources of spatio-texual entities, pairs of entities that match, i.e. correspond to the same real-world entities. In this paper, we focus on interlinking spatio-textual entities based solely on their name, that is we handle the problem of toponym interlinking. To solve the problem, works in the literature exploit generic string similarity measures and either apply them as is, or integrate them as training features in classification models, without adapting/extending them based on the specific characteristics of toponyms. In this work, we showcase that domain knowledge can significantly improve the accuracy of toponym interlinking, by proposing domain specific similarity measures that take into account specificities of toponyms. We assess the implemented measures on Geonames and demonstrate significant increases in interlinking accuracy compared to baseline methods.