English  |  正體中文  |  简体中文  |  Items with full text/Total items : 27287/39131
Visitors : 2442270      Online Users : 31
RC Version 4.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Adv. Search
LoginUploadHelpAboutAdminister

Please use this identifier to cite or link to this item: http://ntour.ntou.edu.tw:8080/ir/handle/987654321/30954

Title: 醫療問診對話系統中指涉現象之處理
Coreference Handling in a Medical Diagnosis Dialog System
Authors: Yen-Heng Chen
陳彥亨
Contributors: NTOU:Department of Computer Science and Engineering
國立臺灣海洋大學:資訊工程學系
Keywords: 同指涉;對話系統;醫療問診
coreference;dialog system;medical diagnosis
Date: 2012
Issue Date: 2012-04-16T03:20:18Z
Abstract: 本論文提出了一個在電腦化虛擬病人系統(computerized virtual patient, CVP)這個醫療領域的對話系統中,處理同指涉現象的系統。虛擬病人系統需要解決的問題就是對話中常常會出現口語化的現象,包含同指涉與省略現象,這兩種現象都會造成句子的資訊不夠完整。過去有許多在特定領域下的同指涉研究,卻少有關注在問診對話這個與醫療領域相關的研究。   實驗資料來自虛擬病人的實際教學錄音和醫院中實際的門診對話錄音,總共標記了270個同指涉現象與1301個代名詞指涉現象,本論文將同指涉現象的處理分為偵測指涉字串邊界及還原被指涉資訊。使用特定詞彙與字串序列的pattern來偵測指涉字串,precision可達94.72%。   指涉的對象包含了病症、身體部位、親屬稱謂、參與對話的醫生與病人等等,我們設會紀錄上述的那些資訊,以便進行還原。同指涉還原透過選擇判斷距離策略、判斷頻率策略以及混合上述兩者的混合式策略來進行規則式的還原。以頻率作為判斷依據的方式表現最好,正確率可達76.69%。也測試了機器學習方法來進行病症類型的還原,我們使用了CRF(condition random field),並以5等份交叉驗證法(5-fold cross-validation)測試其效能,最後還原的正確率可為75%。在我們所關注的指涉類型上,整體效能的F值為71.21%。   代名詞指涉的偵測透過字串比對的方式就能將代名詞都偵測出來,所以我們將重點放在指涉對象的類型與還原上,透過句子中的資訊及判斷規則,類型判斷的正確率能達到97.33%,使用判斷頻率以及判斷距離的規則式方法,在以知類型的狀況下進行還原正確率可達92%,整體可達89.33%。
Computerized virtual patient (CVP) is a domain specific dialog system. we will handle coreference in medical diagnosis for this system.   CVP has a problem need to resolve, that is when we talk with people, we will abridge some word or use anther word to replace original word. This phenomenon makes CVP can’t understand the sentence original means. So we focus on coreference phenomenon in medical diagnosis.   Our experiment data is come from teaching sound recording and outpatient services sound recording in hospital. We annotated 270 coreference and 1301 anaphora phenomenon.   Coreference handling contain detection and resolution. We take patterns through statistic POS subsequence and word frequency, and detect coreference word boundary by comparing patterns with sentence. We can achieved an detection precision of 94.72% in coreference.   There is many information types will be antecedent, like disease, body-part, family members, doctor and patient in this dialog. So we record information in above types, and using rule-based or machine learning to recover antecedent. The rule based with frequency strategy’s the accuracy is 76.69%. if we use CRF(condition random field) by 5-fold cross-validation, the accuracy achieves 75%.   Anaphora can easy to detect coreference word boundary by comparing with pronoun list. If we use information in sentence and disease words, precision is 97.33 in antecedent type classifying. When we know the true antecedent type, using frequency strategy and distance strategy, the recover accuracy is 92%. Its overall accuracy is 89.33%.
URI: http://ethesys.lib.ntou.edu.tw/cdrfb3/record/#G0M98570019
http://ntour.ntou.edu.tw/handle/987654321/30954
Appears in Collections:[資訊工程學系] 博碩士論文

Files in This Item:

File Description SizeFormat
index.html0KbHTML236View/Open


All items in NTOUR are protected by copyright, with all rights reserved.

 


著作權政策宣告: 本網站之內容為國立臺灣海洋大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,請合理使用本網站之內容,以尊重著作權人之權益。
網站維護: 海大圖資處 圖書系統組
DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback