Τεχνικές ταξινόμησης αποτελεσμάτων μηχανών αναζήτησης με βάση την ιστορία του χρήστη Full text

Αντώνιος Νικολαΐδης
Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών, ΕΜΠ
2009
Diploma Thesis
Abstract. The scope of this thesis is to design and develop techniques that will improve the classification results of a search engine, so that the classification is personalized for that user. The usefulness of a search engine for a user depends on the relevancy of the results presented in relation to those he was expecting to see. Most search engines employ methods to rank the results so that the most relevant results are at the top of the rankings. As the use of the Internet and the use of search engines are increasing, it is clear that the traditional method of a single classification for all users is not quite satisfactory. The personalization of the results makes our application a way to address this problem of search engines. The change of classification is based on the history of the user’s activity while he conducts searches on the Internet. The history includes only clickstream type of data, i.e. the sequence of clicks the user made during his use of the search engine. Analyzing this data, we draw several conclusions about the user's preferences. The user does not need to explicitly state his preferences, but they are exported implicitly, based on the fact that clicking on any results while deliberately ignoring others that appear higher in the initial ranking means the user prefers the clicked-on result. At the same moment, we are recording the characteristics of the results displayed to the user, regardless of whether or not they were clicked on. After we collect a sufficient number of those preferences, we input them alongside with the characteristics of the results to a Support Vector Machine learning algorithm. The algorithm produces a trained model, which effectively learns the characteristics of the results that the user prefers. We can use this model to change the classification of the results returned by the search engine, so that they are presented personalized for that user.