English  |  正體中文  |  简体中文  |  Items with full text/Total items : 28611/40649
Visitors : 618146      Online Users : 80
RC Version 4.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Adv. Search

Please use this identifier to cite or link to this item: http://ntour.ntou.edu.tw:8080/ir/handle/987654321/51454

Title: A feature selection approach for automatic e-book classification based on discourse segmentation
Authors: Jiunn-Liang Guo
Hei-Chia Wang
Ming-Way Lai
Contributors: 國立臺灣海洋大學:商船學系
Keywords: Discourse segmentation
Feature selection
Text classification
Word sense disambiguation
Date: 2015-01
Issue Date: 2018-11-29T03:41:59Z
Publisher: Program: electronic library and information systems
Abstract: Abstract: Purpose
– The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents – e-books of online library system. The main idea mainly aims on automatically identifying the discourse features in order to improving the feature selection process rather than focussing on the size of the corpus.

– The proposed framework intends to automatically identify the discourse segments within e-books and capture proper discourse subtopics that are cohesively expressed in discourse segments and treating these subtopics as informative and prominent features. The selected set of features is then used to train and perform the e-book classification task based on the support vector machine technique.

– The evaluation of the proposed framework shows that identifying discourse segments and capturing subtopic features leads to better performance, in comparison with two conventional feature selection techniques: TFIDF and mutual information. It also demonstrates that discourse features play important roles among textual features, especially for large documents such as e-books.

Research limitations/implications
– Automatically extracted subtopic features cannot be directly entered into FS process but requires control of the threshold.

Practical implications
– The proposed technique has demonstrated the promised application of using discourse analysis to enhance the classification of large digital documents – e-books as against to conventional techniques.

– A new FS technique is proposed which can inspect the narrative structure of large documents and it is new to the text classification domain. The other contribution is that it inspires the consideration of discourse information in future text analysis, by providing more evidences through evaluation of the results. The proposed system can be integrated into other library management systems.
Relation: 49(1) pp.2-22
URI: http://ntour.ntou.edu.tw:8080/ir/handle/987654321/51454
Appears in Collections:[商船學系] 期刊論文

Files in This Item:

File Description SizeFormat

All items in NTOUR are protected by copyright, with all rights reserved.


著作權政策宣告: 本網站之內容為國立臺灣海洋大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,請合理使用本網站之內容,以尊重著作權人之權益。
網站維護: 海大圖資處 圖書系統組
DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback