IMSI, Publication Details: Approximate Regional Sequence Matching for Genomic Databases

Home PublicationsApproximate Regional Sequence Matching for Genomic Databases

Approximate Regional Sequence Matching for Genomic Databases

T. Vergoulis, T. Dalamagas, D. Sacharidis, T. Sellis

Volume 21, Issue 6, pp 779-795

2012

Journal

Contact persons: Thanasis Vergoulis , Theodore Dalamagas , Timos Sellis , Dimitris Sacharidis

Abstract. Recent advances in computational biology have raised sequence matching requirements that result in new types of sequence database problems. In this work, we introduce an important class of such problems, the approximate regional sequence matching (ARSM) problem. Given a data and a pattern sequence, an ARSM result is an approximate occurrence of a region of the pattern in the data sequence under two conditions. First, the region must contain a predetermined area of the pattern sequence, termed core. Second, the allowable deviation between the region of the pattern and its occurrence in the data sequence depends on the length of the region. We propose the PS-ARSM method that processes holistically the regions of a pattern, taking advantage of their overlaps to efficiently identify the ARSM results. Its performance is evaluated with respect to existing techniques adapted to the ARSM problem.