Discovery and Integration of Data and Services in the Semantic Web
PhD Thesis, School of Electrical and Computer Engineering, National Technical University of Athens
2008
PhD Thesis
- Contact persons: Dimitris Skoutas , Timos Sellis
Abstract.
The Web constitutes a universal repository providing a huge amount of information in a variety of topics and formats. At the same time, the number of users has increased significantly, their participation has become more active, and their needs are more complex. Thus, new trends arise, emphasizing on the need for integration and collaboration. To address these new challenges, a lot of research efforts have been devoted to the transition to the Semantic Web, which will enhance the current Web with formal and explicit metadata, promising to facilitate interoperability and to increase the automation in searching, managing, and sharing information.
In this direction, this thesis studies the problem of searching for relevant services and data on the Semantic Web, as well as integrating information from heterogeneous sources to meet specific needs and requirements. First, we study the problem of Web service discovery. We propose a similarity measure for comparing service descriptions, using the semantic information conveyed by the ontologies used to annotate these descriptions. We also develop techniques, drawing from concepts related to skyline queries, for ranking available services under diverse user preferences and multiple matching criteria. Then, we study the search of services and data in distributed environments, considering peer-to-peer networks where the available resources are semantically annotated. We propose an approach for efficient and progressive search of services in a structured peer-to-peer overlay network, and a method to facilitate the sharing of structured data in an ontology-enhanced peer data management system. Finally, we propose techniques to facilitate the conceptual design of Extract-Transform-Load processes, which are critical processes for reconciling information from several heterogeneous sources. These techniques also rely on the use of ontologies to identify