Please use this identifier to cite or link to this item:
A Genomic Sequence Matching Algorithm with Edge-base Quadtree Pattern Matching
|Authors: ||Chih-Yuan Chen|
|Contributors: ||NTOU:Department of Electrical Engineering|
Genomic Sequence;Quadtree;tree pattern match
|Issue Date: ||2011-06-22T08:53:08Z
|Abstract: ||隨著DNA資料庫的產生，尋找相似序列的比對演算法成為許多研究人員所努力的目標。傳統的字串比對演算法，如Boyer More和Knuth-Morris-Pratt (KMP)，在處理基因字串的比對上有一些缺點，例如效率較差、比對過程過於複雜，以及占用太多記憶體空間等。因此，許多針對基因序列比對的演算法相繼被提出，如 Suffix tree和ACGT-words tree等。 本論文針對Suffix tree太佔記憶體空間的問題及ACGT-words tree比對過程過於煩瑣的問題來加以改善。此外，我們的方法還能提供一個上述兩種基因字串比對演算法都無法處理的功能，即Multi-Pattern搜尋。|
Since the inception of Human Genome Project back in 1990s researchers in multiple disciplines have been seeking for highly efficient sequence alignment algorithms. Because a genome sequence can be regarded as a text string, many string matching algorithms can indeed be use for the genome sequencing tasks. However, many well-known, traditional string matching algorithms such as the Boyer More and the Knuth-Morris-Pratt (KMP), when used to match the genome sequences are often plagued by inefficiency, computational complexity, and consumption of huge memory space. Therefore, algorithms tailored for genomic sequencing have attracted a new generation of algorithm designers. In this thesis, we focus our attention on improving the existing algorithms for genome sequence matching. We have successfully reduced the huge memory consumption of the suffix tree method. Also, by introducing a new quadtree pattern matching we are able to avoid the highly complicated matching process of the ACGT-words tree algorithm. Last but not least, our method performs multi-pattern searching, which is beyond the capability of the algorithms we set out to improve.
|Appears in Collections:|
Files in This Item:
There are no files associated with this item.
All items in NTOUR are protected by copyright, with all rights reserved.