Challenges and Opportunities in the Evolving Data web

George Papastefanatos
1st International Workshop on Modeling and Management of Big Data (MoBiD ), November 13, 2013, Hong Kong

The Linked Data Paradigm is a promising technology for publishing, sharing, and connecting large volumes of structured interrelated information on the Web. Data Web refers to this vast and rapidly increasing quantity of scientific, corporate, government and crowd-sourced data published as Linked Open Data, encouraging the uniform representation of heterogeneous data items on the web (usually by commonly agreed vocabularies in RDF format) and the creation of interlinks between them. The growing availability of open linked datasets provides new perspectives for data integration and interoperability, new usage scenarios and novel applications.

This paper will highlight several opportunities and challenges brought forward by this promising direction for publishing web data. We will focus on publishing and modeling approaches to LOD through our experience from two case studies. The first refers to the publishing of multidimensional statistical data as LOD and the second concerns the publishing of scientific data, and namely genomic and experimental data related to microRNA biomolecules.

We will then focus on the evolution and preservation aspects of publishing evolving data on the Data Web. The proliferation of distributed, interlinked data sources poses significant new challenges for consistently managing the vast number of potentially large datasets and their interdependencies across time. We will discuss these challenges and present possible research directions.