中医医案文献自动分词研究
打开文本图片集
摘要:目的 研究适用于中医医案文献自动分词的方案。方法 使用层叠隐马模型作为分词模型,建立相关中医领域词典及测试语料库,对语料库中古代医案文献和现代医案文献各300篇进行分词及评测。结果 在未使用中医领域词典时,两类医案文献分词准确率均为75%左右;使用中医领域词典后,古代医案文献的分词准确率达到90.73%,现代医案文献的分词准确率达到95.66%。在未使用中医领域词典时,词性标注准确率古代医案文献为56.74%,现代医案文献为64.81%;使用中医领域词典后,现代医案文献为91.45%,明显高于古代医案文献的78.47%。结论 现有分词方案初步解决了中医医案文献的分词问题,对现代医案文献的词性标注也基本正确,但古代医案文献的词性标注影响因素较多,还需进一步研究。
关键词:中医医案文献;自动分词;中医领域词典;层叠隐马模型;词性标注
DOI:10.3969/j.issn.1005-5304.2015.02.012
中图分类号:R2-05 文献标识码:A 文章编号:1005-5304(2015)02-0038-04
Study on Automatic Word Segmentation for Traditional Chinese Medical Record Literature ZHANG Fan, LIU Xiao-feng, SUN Yan (Beijing University of Chinese Medicine, Beijing 100029, China)
Abstract:Objective To study the automatic word segmentation scheme suitable for traditional Chinese medical record literature. Methods Hierarchical Hidden Markov Model was used as segmentation model. Totally 300 ancient medical record literature and 300 modern medical record literature were set as experimental subjects to establish the dictionary of traditional Chinese medicine and the test corpus, with a purpose to segment the words and evaluate of the results. Results Without using dictionary of traditional Chinese medicine, the word segmentation accuracy of two kinds of medical record literature was about 75%;the part-of-speech tagging ...
== 试读已结束,如需继续阅读敬请充值会员 ==
|
本站文章均为原创投稿,仅供下载参考,付费用户可查看完整且有格式内容!
(费用标准:38元/2月,98元/2年,微信支付秒开通!) |
升级为会员即可查阅全文 。如需要查阅全文,请 免费注册 或 登录会员 |