节点文献

日语依存句法分析技术研究

Research on Japanese Dependency Parsing Technology

【作者】 成姣

【导师】 蔡东风;

【作者基本信息】 沈阳航空航天大学 , 计算机应用技术, 2011, 硕士

【摘要】 日语依存关系解析是日语句子解析的一项基本技术,主要基于日语依存语法来确定句子中文节与文节间的依存关系。句法分析是进行语义分析等深层自然语言处理的首要基础,是诸多自然语言处理应用系统不可或缺的一个重要环节。依存关系解析在机器翻译、信息抽取、自动问答等领域有着重要的应用。目前对日语依存解析的相关研究,重点都集中在对学习框架的修改上,机器学习算法大多采用支持向量机或其他基于边界和记忆学习的方法。条件随机场作为一种优秀的序列标注器,在序列标注方面有着出色的表现,被成功地运用在自然语言处理的任务中,并取得了很好的效果,但是在日语依存关系解析方面,却未见相关的报道。本文采用层叠组块算法和条件随机场相结合的方法进行日语依存关系解析,融入丰富的上下文信息,从整句的角度给予每个标注单元一个最优的标注结果。在日本京都大学文本语料库(Version 4.0)上的实验结果表明,该方法在不使用动态特征的条件下,依存正确率和句子正确率分别取得了很好的效果。规则方法作为统计方法的有益补充,仍被广泛的用于自然语言处理的诸多领域中。传统的规则获取是根据知识工程师的经验和知识手工编写,完全依赖于编写规则的知识工程师的语言知识,获取规则集合需要大量的人力和物力。针对传统的获取规则方法的不足,本文采用了基于条件随机场的错误驱动机制,将条件随机场的一次识别结果作为特征加入到条件随机场二次识别的特征模板中,利用统计方法来自动学习其中的错误规律,训练得到机器识别模型并进行纠错,在上述的语料库上的实验结果表明,该方法进一步提高了依存关系解析的效果。

【Abstract】 Japanese dependency analysis is recognized as a basic technique in Japanese sentence analysis, and it determines the dependency relationship between“bunsets”based on Japanese dependency grammar. Syntax parsing is the primary basis of deep natural language processing such as semantic analysis and is an indispensable part for many natural language processing application systems. Dependency analysis plays an important role in machine translation, information extraction, automatic question answering and other fields.Current related researches on Japanese dependency analysis have focused on changing the learning framework, and machine learning algorithms are used to Support Vector Machines or other boundary-based methods of learning and memory. Conditional Random Fields as an excellent sequence labeler has good performance in sequence labeling. This method has been successfully used in natural language processing tasks and obtained good results. However, there is no relevant reports on Japanese dependency analysis. This paper proposes a new method combining Cascade Chunking Algorithm with Conditional Random Fields into the rich contextual information, to give each unit an optimal labeling result from the point of whole sentence. Experiments on Kyoto University Text Corpus (Version 4.0) show that our method has achieved good results in dependency accuracy and sentence accuracy even without dynamic features.Rule method as a useful complement to statistical methods, is still widely used in many natural language processing fields. The traditional rule method is based on rules hand-written by knowledge engineers according to their experiences and knowledge, and entirely depends on the language knowledge of engineers who develop rules. The creation of rule set needs a lot of manpower and material resources. In order to make up for shortages of traditional rule method, the error-driven technique based on Conditional Random Fields is adopted to parsing again for improving the parsing results. It uses statistical methods to automatically learn the error disciplines and obtain machine identification model via training, the results in the first identification stage of parsing with Conditional Random Fields are used as the features to be added in the feature template in the second stage to learn the error disciplines and correct the errors for the second parsing. Expermental results on the same corpus metioned above show that our method further improves accuracy of dependency analysis.

节点文献中: