Implementation of suffix tree construction using Hadoop MapReduce Full text

Alexandros Konstantinakis - Karmis
Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών, ΕΜΠ
2010
Diploma Thesis
Abstract. Suffix trees are a form of index widely used for sequences of biological data. Their use is crucial for search algorithms used in biology. In recent years science has produced growing numbers of biological data. The aim of this diploma thesis was to study the main algorithms for constructing suffix trees in memory, in the hard drive and in parallel systems. Secondly, to implement algorithms for parallel construction of suffix trees using Hadoop MapReduce and based on the technique of Trellis, the most efficient suffix tree construction method in the hard drive today. Finally, experiments were conducted to evaluate the behaviour of these algorithms in parallel execution.