English  |  正體中文  |  简体中文  |  Items with full text/Total items : 28611/40652
Visitors : 773436      Online Users : 49
RC Version 4.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Adv. Search
LoginUploadHelpAboutAdminister

Please use this identifier to cite or link to this item: http://ntour.ntou.edu.tw:8080/ir/handle/987654321/17892

Title: 具可變長度及容忍度特徵之字串搜尋比對演算法
Efficient Pattern Matching Algorithms for Variable-Length and Tolerant Strings
Authors: Jia-Han Chu
朱家漢
Contributors: NTOU:Department of Computer Science and Engineering
國立臺灣海洋大學:資訊工程學系
Keywords: 雜湊編碼;階梯式步進法;階梯式區間跳躍法;投票演算法
Hashing Encoding;Stepping;Interval Jumping;Voting Algorithm
Date: 2003
Issue Date: 2011-07-04
Abstract: 本論文提出一個可以供多重核甘酸與蛋白質序列進行共同區段之比對演算法,應用數值編碼之唯一性及階梯步進式或區間跳躍式比對之法則,增進共同區段之搜尋速度。演算法共分為三個步驟:編碼、排序與搜尋,編碼階段負責將鹼基或胺基酸轉換至數值空間集合,排序階段採快速排序演算法完成排序任務,而搜尋階段則依數列大小進行步進式比對或依比對樣本長度進行數值區間最佳之均勻切割或位元切割,以增加區間跳躍之機率並提升比對之速度。實驗結果證明本演算法可以將傳統比對所需之時間複雜度由O(mLi(Li-m+1)+mLj(Lj-m+1))降低至O(|Ii|+|Ij|)。爲了能夠提供核甘酸序列兩兩比對的正確次序,本論文運用不同之統計方式針對RNase家族、靈長與非靈長類家族、Yeast GAL promoter 與Yeast MATalpha2 promoter資料分別進行分析,並且可以預測出擁有較少共同子序列統計量的兩條核甘酸序列,除此之外,應用投票演算法機制提供出現率特徵與開放式基座容忍度設定之共同子字串搜尋功能。期望經由結合這些演算法能夠提供具實用價值並且高效率的查詢方法,從多重核甘酸序列中或蛋白質家族序列快速地搜尋出具有依特徵或具有容忍度的共同子序列。 關鍵字:雜湊編碼(Hash coding)、階梯式步進法(Ladderlike Stepping)、階梯式區間跳躍法(Ladderlike Interval Jumping)、投票演算法(Voting Algorithm)。
In this thesis, a novel algorithm for searching common segments in multiple DNA or animo acid sequences is designed. To improve efficiency in pattern searching, combination of hashing encoding, quick sorting and ladderlike stepping and/or interval jumping techniques are applied. Since multiple sequence alignment of DNA or animo acid sequences from the giant genomic database is usually time consuming, we develop a three-phase methodology to search common sub-strings and reduce the time complexity in the comparison. In the first coding phase, DNA or animo acid sequences are transformed into a numerical space set. Subsequently, the quick sort algorithms are employed in the second sorting stage to reorder the encoded data. In the last searching phase, ladderlike stepping and interval jumping rules are proposed to increase efficiencies of numerical comparison. In addition, uniform and bitwise interval segmentation techniques are applied prior to interval jumping procedures. The segmenting methodologies are designed according to the length of searching pattern, and the proposed ladderlike searching algorithms provide improved performance. Experimental results show that the algorithms are capable of reducing time complexity from O(mLi(Li-m+1)+mLj(Lj-m+1)) to O(|Ii|+|Ij|). In order to provide the right pair comparing order for multiple DNA or animo acid sequences, several strategies based on statistics are designed for several classified sequences. In addition to the analysis of comparing order, based on voting mechanism, the representation and tolerant features of sub-string among sequences are also taken into consideration in this thesis. Keywords: Stepping, Interval Jumping, Hashing Encoding, Voting Algorithm
URI: http://ethesys.lib.ntou.edu.tw/cdrfb3/record/#G0M91570001
http://ntour.ntou.edu.tw/ir/handle/987654321/17892
Appears in Collections:[資訊工程學系] 博碩士論文

Files in This Item:

File Description SizeFormat
index.html0KbHTML183View/Open


All items in NTOUR are protected by copyright, with all rights reserved.

 


著作權政策宣告: 本網站之內容為國立臺灣海洋大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,請合理使用本網站之內容,以尊重著作權人之權益。
網站維護: 海大圖資處 圖書系統組
DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback