Εργαλείο Συλλογής και Οργάνωσης Γνώσης με Μηχανισμούς Μετα-Αναζήτησης στον Ιστό Full text

Αργύρης Κόλλιας
Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών, ΕΜΠ
Diploma Thesis
Abstract. The scope of this diploma thesis is the development of a knowledge collection and organization application, equipped with an advanced web meta-searching mechanism. This thesis is based upon the open source tool FreeMind (http://freemind.sourceforge.net/), which specializes in mind-mapping. During the creation of a map of thoughts, ideas or / and tasks, linked all together (mindmap), the user will now be able to search for information (as far as a topic of this graph is concerned) in the World Wide Web, and furthermore add elements, enriching in this way the diagram. The searching process can be adapted to the user’s needs, and particularly, a choice of the desired result data type (for example webpages or papers) can be made, while special concern is paid for scientific publications, in favor of which a Data Base – exact copy of the one DBLP uses has been acquainted, in order for the final information to be even more approved and fuller. It must also be recorded, that our system receives results from various popular search engines, even from ad hoc ones, while in the end the results are being presented merged and sorted, according to the “democratic” algorithm weighted Borda-Fuse. What is more, a Mozilla-based web-browser has been integrated, which beyond usual navigation services in the Internet, it also takes advantage of the screen-scraping technique, in order to allow even beginner-level users to expand the system by adding more search engines. This is being accomplished by combining an appropriate .xml file, which specifications are described in a frugal .dtd file. Finally, the wrapping process of a result webpage, which origins from a newly being installed search engine, is summarized in the prediction and composition of the full search URL, the result structural segments labeling on behalf of the user (title – web link – summary / description) and the machine learning of the reading (based on the former indication) regarding the three parts of every one result.