Top-k Dominant Web Services under Multi-Criteria Matching Full text

Dimitrios Skoutas, Dimitris Sacharidis, Alkis Simitsis, Verena Kantere, Timos Sellis
In Proceedings of the 12th International Conference on Extending Database Technology (EDBT '09)
Abstract. As we move from a Web of data to a Web of services, enhancing the capabilities of the current Web search engines with effective and efficient techniques for Web services retrieval and selection becomes an important issue. Traditionally, the relevance of a Web service advertisement to a service request is determined by computing an overall score that aggregates individual matching scores among the various parameters in their descriptions. Two drawbacks characterize such approaches. First, there is no single matching criterion that is optimal for determining the similarity between parameters. Instead, there are numerous approaches ranging from using Information Retrieval similarity metrics up to semantic logic-based inference rules. Second, the reduction of individual scores to an overall similarity leads to significant information loss. Since there is no consensus on how to weight these scores, existing methods are typically pessimistic, adopting a worst-case scenario. As a consequence, several services, e.g., those having a single unrelated parameter, can be excluded from the result set, even though they are potentially good alternatives. In this work, we present a methodology that overcomes both deficiencies. Given a request, we introduce an objective measure that assigns a dominance score to each advertised Web service. This score takes into consideration all the available criteria for each parameter in the request. We investigate three distinct definitions of dominance score, and we devise efficient algorithms that retrieve the top-k most dominant Web services in each case. Extensive experimental evaluation on real requests and relevance sets, as well as on synthetically generated scenarios, demonstrates both the effectiveness of the proposed technique and the efficiency of the algorithms.