English  |  正體中文  |  简体中文  |  Items with full text/Total items : 26988/38789
Visitors : 2344567      Online Users : 37
RC Version 4.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Adv. Search
LoginUploadHelpAboutAdminister

Please use this identifier to cite or link to this item: http://ntour.ntou.edu.tw:8080/ir/handle/987654321/49307

Title: 使用群眾地理資料進行地名解析
Toponym resolution using crowd-contributed geographic data
Authors: Chang, Ming-Lun
張銘倫
Contributors: NTOU:Department of Computer Science and Engineering
國立臺灣海洋大學:資訊工程學系
Keywords: 群眾地理資料;地名詞庫;地理編碼;多語言處理;OpenStreetMap
Crowd-contributed geographic data;gazetteer;geocoding;multi-language process;OpenStreetMap
Date: 2015
Issue Date: 2018-08-22T06:56:29Z
Abstract: 從文章中擷取地理位置一直以來都是一個熱門的研究主題。過去的研究曾嘗試從部落格文章、推特、或是其他的文字來源標示地理位置。在這些研究之中, 大多數都使用自然語言處理的方式進行關鍵詞的分析後,再搭配數種不同的啟發式規則進行地理位置的擷取。然而,這些研究大多得面對二個具有挑戰性的問題: 語言處理的困難度以及地點標示的精細度。因此,大多數的研究都只針對單一語言。此外,它們所擷取的精細度要求只到達城鎮或是城市的層級。這些限制也使得這些研究無法部署在較大規模的應用中。 在這個研究裡,我們嘗試發展一個高精細度且同時支援多種語言的地理位置擷取系統。我們主要針對新聞網站的內容進行擷取。和現有其他研究的不同點是,我們並不使用自然語言處理的方法。取而代之的是,我們基於簡單的N-gram方法進行斷詞,並搭配由群眾貢獻的地理資料進行地名詞庫的識別。此外,我們也使用群眾貢獻的地理資料內額外的特徵,包括地名詞庫的大小以及詞庫之間的相對關係進行分析,以提高系統的精確度。基於這些創新的方法設計,我們所提出的方法可以精準地從不同語言的文字中擷取出足以代表來源文字的地理位置。我們的實驗結果也顯示,我們所提出的方法有高達 90.64%的正確率。
Retrieving geocodes from text is a popular research topic. Researches have attempted to retrieve geocodes from blogs, tweets, and many other different types of text sources. Among these researches, many of them use natural language processing techniques to do term segmentation and then identify geocodes based on a number of heuristics. However, these researches have to face two challenging problems: the difficulty on language processing and the granularity of geocodes. Therefore, most current researches focus on a single language. In addition, many of them identify gazetteers only at the level of town or city. These constraints also make them difficult to be deployed in a large scale application. In this study, we attempt to develop a high-resolution and multi-language geocoders for news articles. Unlike existing researches, we do not leverage natural language processing techniques. Instead, we use a simple N-gram based tokenizing approach and identify gazetteers based on crowd-contributed geographic data. We also consider additional features available in crowd-contributed geographic data including area and relationships between identified gazetteers to improve the accuracy of our approach. With these innovative designs, our proposed approach is able to identify gazetteers embedded in texts as well as handling texts in different languages. Our evaluation show that the proposed approach has an accuracy of 90.64%.
URI: http://ethesys.lib.ntou.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dstdcdr&s=G0010257030.id
http://ntour.ntou.edu.tw:8080/ir/handle/987654321/49307
Appears in Collections:[資訊工程學系] 博碩士論文

Files in This Item:

File Description SizeFormat
index.html0KbHTML2View/Open


All items in NTOUR are protected by copyright, with all rights reserved.

 


著作權政策宣告: 本網站之內容為國立臺灣海洋大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,請合理使用本網站之內容,以尊重著作權人之權益。
網站維護: 海大圖資處 圖書系統組
DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback