节点文献

规则与统计相结合的兼类词处理机制

【作者】 张丽静

【导师】 黄德根;

【作者基本信息】 大连理工大学 , 计算机应用技术, 2002, 硕士

【摘要】 词性标注是自然语言处理中的一项基础性课题,词性标注的正误对汉语语料库标注、机器翻译和大规模文本的信息检索等都有重要的意义。 本文对词性标注的方法进行了研究,分析了基于规则的方法和基于统计的方法的优缺点。在此基础上提出了规则和统计相结合的排歧策略。在规则方法中,改进了规则库的构建方法,用兼类词词性代替兼类词本身,并尝试使用统计辅助构建规则库;在统计方法中,在二元语法模型基础上引入了学习机制的概念,根据学习结果对词性概率和词汇概率的获取方法进行了修正。按照上述策略,实现了一个兼类词处理系统,闭式标注正确率达97.85%,开式标注正确率达96.71%。试验测试结果标明规则和统计相结合的兼类词处理机制可以有效地提高词性排歧正确率和词性标注正确率。

【Abstract】 Part-of-speech tagging is a fundamental theme in natural language processing . It is significant to the tagging of Chinese corpus-based, machine translation and information indexing of large scale text.In this paper, we study the method of the part-of-speech tagging and analyze the rule method and the statistics method. Basing on it we bring forward the disambiguation strategy using rule techniques and statistics techniques .In rule model, the acqusition method of rules base is improved .We use the part-of-speech of syntactic category to replace the syntactic category .In addition, statistics method is used to help to construct the rule base. In statistics model, the concept of learning machine-made is presented .In according to the result of learning,the method of calculating transition probabilities and symbol probabilities are amended. With the above method, a system of disambiguation is materialized. The overall accuracy of close test is 97.85% and the accuracy of open test is 96.71% . The experimental results show the tagging accuracy and disambiguation accuracy are raised by using rule techniques and statistics techniques .

  • 【分类号】TP391.1
  • 【被引频次】8
  • 【下载频次】267
节点文献中: 

本文链接的文献网络图示:

本文的引文网络