English  |  正體中文  |  简体中文  |  Items with full text/Total items : 27533/39387
Visitors : 2539111      Online Users : 28
RC Version 4.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Adv. Search

Please use this identifier to cite or link to this item: http://ntour.ntou.edu.tw:8080/ir/handle/987654321/18631

Title: 藉由維度擴張與虛擬反矩陣之分群演算法
Clustering via Dimension Extension and Pseudo-inverse Transformation
Authors: Yu-Chen Chen
Contributors: NTOU:Department of Electrical Engineering
Keywords: 分群方法;虛擬反矩陣轉換;墨氏距離;主成份分析;投票驗證
Clustering;Pseudo-inverse transformation;Mahalanobis distance;Principal component analysis;Centroids;k-means
Date: 2007
Issue Date: 2011-07-04
Abstract: 本論文題出一種新的分群(clustering)方法,係結合維度擴張與虛擬反矩陣轉換處理高維度資料分群,吾人稱之為DEPIT (Dimension Extension and Pseudo-inverse Transformation)。此分群方法不需設定確切初始分群數目,多餘群中心在遞迴演算過程中會被自動刪除,藉此克服初始群中心過多的問題。此分群方法是建立在先將資料維度擴張之後再利用虛擬反矩陣求得資料點以各群中心所組成之線性組合,其中線性組合係數可視為資料點對於群中心之相似程度(或對該群中心向量之投影),選取係數最大者且將資料點貼上對應之群中心之標籤(label);另外,對於被貼上相同標籤之資料,利用墨式距離(Mahalanobis distance)運算出新的群中心,能克服極端值(outlier)對計算群中心的影響且適於高維度資料運算。 本論文亦著墨於分群正確性的驗證,首先以主成份分析(principal component analysis)偵測是否有大量資料分佈於同一主軸上,若是,則將資料以所有資料位置所算出的質心(center of mess)位置為原點旋轉(rotate)一固定之角度,再將旋轉後的資料以DEPIT方法作分群。經過紀錄數次旋轉分群結果,各資料點擁有數次被貼上標籤的結果,吾人可藉比較何種標籤較多,如同是以投票方式決定最終資料將被貼上何種標籤,為一種基於投票機制之驗證(vote-based validation)方法以決定最終分群結果。
This thesis presents a novel approach which incorporates Dimension Extension and Pseudo-Inverse Transformation (DEPIT) to realize multi-dimensional data clustering. Unlike the k-means algorithm, the propose DEPIT needs not pre-specify the number of clusters k, during an iterative labeling process centroids locations are updated and redundant centroids eliminated automatically, thereby overcoming the problem of over-specifying k value. Clustering is performed by pseudo-inverse transforming the input data such that each data point is represented by a linear combination of bases with extended dimension, with each basis corresponding to a centroid and the coefficient associated with a basis representing the closeness between the data point and that basis. Issue of clustering validation is also addressed in this thesis. First, Principal Component Analysis is applied to detect if there exist a heavily dominated dimension, if so, the original input data will be rotated by an angle w.r.t. a defined mass center, and the resulting data is subjected to the iterative labeling process. After plural runs of rotation and iterative labeling process, the labeled results from various runs are compared, a data point labeled to a centroid more times than others will be labeled to the class indexed by that centroid, just like a voting process. Furthermore, we show that Mahalanobis metric is good in updating centroids, as it is robust against outlier and more accurate and efficient in dealing with multi-dimensional data.
URI: http://ethesys.lib.ntou.edu.tw/cdrfb3/record/#G0M94530079
Appears in Collections:[電機工程學系] 博碩士論文

Files in This Item:

File Description SizeFormat

All items in NTOUR are protected by copyright, with all rights reserved.


著作權政策宣告: 本網站之內容為國立臺灣海洋大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,請合理使用本網站之內容,以尊重著作權人之權益。
網站維護: 海大圖資處 圖書系統組
DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback