节点文献

基于统计的常用汉语副词用法自动识别研究

Studies on Automatic Recognition of Common Chinese Adverb’s Usages Based on Statistics

【作者】 张军珲

【导师】 昝红英;

【作者基本信息】 郑州大学 , 计算机软件与理论, 2010, 硕士

【摘要】 现代汉语副词用法自动识别是面向自然语言处理的现代汉语副词知识库研究的重要内容之一,针对基于规则的现代汉语副词用法自动识别方法存在的不足,本文在已有工作的基础上,进一步提出了基于统计的常用汉语副词用法自动识别方法。分别采用条件随机场模型、最大熵模型和支持向量机模型,在1998年1月份《人民日报》分词与词性标注语料上,对8个常用的现代汉语副词进行了统计实验,实验表明基于统计的方法在现代汉语副词用法自动识别上具有较好的识别效果,能够很好地对未知的副词用法进行预测,在真实语料中取得了较高的准确率,与规则方法相比,统计实验结果的平均准确率有了较大的提高。实验证明基于统计的方法在常用现代汉语副词用法自动识别方面具有良好的应用前景。根据俞士汶等提出的构建“三位一体”的现代汉语虚词知识库的思想,本文着重研究现代汉语副词用法的自动识别,致力于采用统计机器学习方法实现副词用法的自动识别。本文的主要工作包括:(1)针对已经初步构建的现代汉语副词知识库,以副词用法信息词典中的例句集作为语料来考察副词用法规则,分析规则存在的问题,对用法规则进行修改,进而完善副词知识库。(2)使用基于规则的方法对人民日报语料中副词用法进行自动识别,并对识别结果进行人工校对,形成副词用法语料库,并作为实验语料。在对人民日报语料进行人工校对的同时,分析规则方法识别结果存在的问题,并进一步完善副词用法信息词典以及副词用法规则库。(3)针对基于规则方法存在的不足,实现基于统计的常用现代汉语副词用法自动识别,进一步提高副词用法识别的准确率。最后,论文对本文的研究工作进行了总结,并对下一步的研究进行了展望,指出了规则与统计方法相结合的现代汉语副词用法自动识别研究的可行性。

【Abstract】 Researching on Automatic Recognizing usages of Modern Chinese Adverbs is one of the important contents of the NLP-oriented Chinese Adverbs Knowledge Base. To solve the problems of the existing rule-based method of adverbs’ usages recognition, this paper bases on the previous work, and further study automatically recognizing Chinese adverbs’usages using statistical methods. Three statistical models, viz. CRF, ME, and SVM, are used to label several common Chinese adverbs’ usages on the tagged corpus of People’s Daily(1998.1) The experiments show that statistical-based method is effective in automatically recognizing of adverbs’usages and has good application prospects.According to the thought building the "Trinity" knowledge-base of functional words, this paper focuses on the important part of the adverb knowledge base—automatically recognizing usages of adverbs, and uses statistical-based method to realize automatically recognizing usages of adverbs.This article mainly includes:(1) According to Chinese Adverb Knowledge Base, we use the example data in the base as our corpus to examine the adverbs’rules, and analyze the problems of rules, and complete the adverb knowledge base.(2) We use the rule-based method to recognize adverbs’usages in our corpus. Then, we manually check the tagging results several times. Finally, formed the standard corpus and use it as the experiment corpus. At the same time, we further perfect the information dictionary and the rule base of adverbs’usages.(3) According to the shortcomings of the rule-based method, we realize automatically recognizing usages of adverbs, and further improve the recognition precision rate.In the end, this paper summarizes the research work, and the next research forecasted, and points out that the feasibility of combing the rule-based method and the statistical-based method on automatically recognizing adverbs’usages.

  • 【网络出版投稿人】 郑州大学
  • 【网络出版年期】2011年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络