Extraction, integration and analysis of crowdsourced points of interest from multiple web sources Full text

George Lamprianidis, Dimitrios Skoutas, George Papatheodorou, Dieter Pfoser
3rd ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information, GeoCrowd 2014
Abstract. The amount of user-generated geospatial content on the Web is constantly increasing, making it a valuable source of information for enabling, enriching and enhancing geospatial applications and services. However, this content is highly heterogeneous and diverse, varying significantly in quality and accuracy. Extracting, integrating, and mining these crowdsourced geospatial data from the Web is far from trivial. Among the main challenges are to retrieve data from multiple sources, each one providing its own access methods and restrictions, to deal with different schemas and taxonomies, and to find matching entities across multiple sources. In this paper, we present our work for retrieving and integrating crowdsourced Points of Interest (POIs) from popular Web sources. We retrieve POIs from different Web sources and we describe the steps taken for mapping the source categories to a common schema, detecting duplicate POIs, and eventually clustering them to identify hotspots. We present the results of this process applied to six major Web sources to retrieve and integrate POIs located in the metropolitan areas of three European capital cities.