32nd International Conference on Scientific and Statistical Database Management. July 2020 Article No.: 20. Pages 1–4
Abstract. In this paper, we introduce an approach for improving the quality of the geocoding process. Geocoding refers to the procedure of mapping an address of textual form to a pair of accurate spatial coordinates. While there is a variety of available geocoders, both open source and commercial, that curate this mapping in either a semi-automated or fully-automated way, there is no one-size-fits-all system. Depending on the underlying algorithm of each geocoder, its output may be very accurate for some addresses, districts or countries, while failing to properly locate some others. Given that, our setup can be thought of as a meta-geocoding pipeline, built on top of the available geocoders. We propose a machine learning approach, which, given an address and a sequence of coordinate pairs suggested by standalone geocoders, it is able to identify the most accurate one. In order to achieve this, we formulate the task as a multi-class classification problem and introduce a series of domain specific training features, capturing essential information about each coordinate pair suggestion, as well as computing comparative metrics among different suggestions. These features are fed into several classification algorithms and are evaluated on a proprietary address dataset of a geo-marketing company. Furthermore, we present LGM-GC, a QGIS plugin, which provides the functionality of our approach through a user-friendly interface.