Enabling persistent identification of groups of duplicates in data aggregators Full text

Giorgos Alexiou, Marios Meimaris, George Papastefanatos
7th International Workshop on Data Engineering meets the Semantic Web (hosted by 32nd International Conference on Data Engineering), May 16-20, 2016, Helsinki, Finland
Abstract. Data aggregators harvest, deduplicate and make available content from disparate data sources in different domains, such as cultural information, academic, and scientific content. The availability of aggregated data in the form of Linked Data is subject to the evolution of information at the data sources, thus proper handling is necessary for published data to comply with Linked Data guidelines, such as persistent identification through time. In this paper we present the problem of disambiguating groups of duplicates in settings where the Information Space is regenerated at its whole in every harvesting cycle of data aggregation and propose an approach that aims at providing persistent identifiers for groups through time.