节点文献
面向文景转换的中文浅层语义分析方法研究
Research on the Method of Chinese Shallow Semantic Parsing for Text-to Scene Conversion
【作者】 李世奇;
【导师】 赵铁军;
【作者基本信息】 哈尔滨工业大学 , 计算机应用技术, 2011, 博士
【摘要】 本文针对中文浅层语义分析中的关键问题展开了全面深入的研究。浅层语义分析是自然语言处理领域里的研究要点,基于语言学特征和统计机器学习的方法是目前浅层语义分析的主流方法,该方法中最关键的因素是特征的选择和机器学习方法的优化。另外,本文中的浅层语义分析主要面向文景转换这项应用任务,文景转换是指把自然语言文本通过计算机自动转换成为相应的场景或动画,是一门具有重要理论和实际意义的新兴研究方向。本文首先对文景转换中必要的共指消解模块进行了研究;然后从特征选择角度对浅层语义分析方法进行了探索,发掘出在浅层语义分析中具有较强区分能力的句法特征;接着提出一种组合分类模型的方法对浅层语义分析进行完善;最后提出一种基于计算认知模型的方法,从更深层面对中文浅层语义分析进行了探索。具体地说,本文主要包括以下研究内容:(1)首先提出一种基于自适应谐振理论(ART)网络的无指导中文名词短语共指消解方法。该方法充分利用了名词短语自身特征,通过调整ART网络模型中的参数动态地控制聚类数量,有效解决了目前聚类共指消解中输出类别数目难以确定这一难题。另外聚类算法中还采用了一种基于信息增益率的特征选择方法,减少了区分度较弱特征给聚类所带来的干扰。该方法在保证了共指消解准确率的前提下,具有较好的可移植性和鲁棒性,突破了目前文景转换中的浅层语义分析在预处理阶段的主要障碍。(2)本文从语言学特征层面深入地研究了中文浅层语义分析,提出一种基于多重句法特征的中文浅层语义分析方法。现有研究表明,对特征集合进行改进是目前提高浅层语义分析性能最有效的方法。本文提出将短语结构句法和依存句法两种类型的句法特征进行融合,为浅层语义分析提供了更加丰富和互补的句法信息。然后在这两个句法特征集合基础上,提出一种基于统计的组合特征选择方法,根据各个特征在语料库中的分布状况,快速有效地筛选出适于各分类阶段的组合特征。最后利用短语结构句法特征、依存句法特征以及在前两者基础上构造的组合特征进行语义分析相关的分类。实验表明,本文提出的多重句法特征集合能够有效地提高中文浅层语义分析的性能,在正确句法分析以及自动句法分析条件下均取得了较好的效果。(3)提出了一种基于组合分类模型的中文浅层语义分析方法,从优化机器学习方法的层面进一步对浅层语义分析进行完善。本文在前面提出的多重句法特征集合基础上,采用五种机器学习方法:K近邻、决策树、感知器、最大熵以及支持向量机,在训练语料上构造了五个语义角色分类模型,作为组合模型中的基本单元。接着通过一种输入相关的选通系统将五个基本分类模型有机地整合到一起,通过调整选通系统中的参数协调各个基本分类模型,控制组合模型的输出结果。最后采用EM算法在训练语料上对选通系统中的参数进行学习,在通用语料库上进行了相关的训练和测试,结果表明该方法能够显著地提高中文语义角色分析的效果。(4)最后,本文提出了探索性的基于计算认知模型的中文浅层语义分析方法,以认知理论为基本依据,通过模拟人类的语言理解过程,从本质上来研究中文浅层语义分析。首先设计了一种面向计算认知模型和文景转换的命题语义表示形式,这种命题形式能够简单高效地表达自然语言中蕴涵的语义信息。本文将该命题形式作为认知模型中的基本单元,然后在认知模型网络上模拟人脑中神经元的扩散激活机制,使符合上下文约束的命题节点不断被加强,不符合上下文约束的节点逐渐被削弱,根据当网络达到稳定状态时的最终激活命题节点,即可还原出谓词相关的语义分析结果。
【Abstract】 In this paper, we conduct comprehensive and deep academic research on the key issues of Chinese shallow semantic parsing (SSP). SSP is an essential research in the area of Natural Language Processing (NLP). Currently, the method based on linguistic features and statistical machine learning is the most prevalent method for SSP. The method involves two key factors: selection of linguistic features and optimization of machine learning method. Additionally, the SSP rearch in this paper is oriented to the Text-to-Scene conversion, which is to automatically convert the natural language text to the corresponding scene or animation by computer. It is a novel research area that has important theoretical and practical significance. Firstly, we study coreference resolution which is a necessary pre-processing module for the Text-to-Scene conversion. Secondly, we explore the issue of SSP from the perspective of linguistic feature selection and discover many discriminative syntactic features. Then, we propose a combined machine learning method to further improve the SSP. Finally, we study the issue from a deeper level and proposes a computational cognitive model-based approach to SSP. Specifically, this paper includes following contents:(1) Firstly, we propose an Adaptive Resonance Theory (ART) network-based unsupervised noun phrase coreference resolution method for Chinese. The method makes fully use of the features of noun phrase. It can dynamically control the amount of cluster by adjusting the parameters of the ART network. Thus it provides an effective solution to the critical problem of cluster-based coreference resolution that the output cluster number, namely the number of the coreference set is difficult to determine or evaluate. Additionally, in the clustering algorithm, we use an information gain ratio-based feature selection method to reduce the interference caused by some weak clustering features. This method achieves a relative high accuracy of coreference resolution and it has good portability and robustness. It addresses the major obstacle of the pre-processing phase in the Chinese SSP in the Text-to-Scene conversion.(2) Then, we intensively study the linguistic features for Chinese SSP and then propose a multiple syntactic features-based method. Current researches show that improving the linguistic feature set is the most effective method to enhance the performance of SSP at present. The proposed method integrates constituent-based and dependency-based syntactic features into a basic feature set, and thus provides more extensive and complementary syntactic information to SSP. Further we propose a statistical combined feature selection method on the basis of the basic feature set. The statistical method can efficiently select discriminative combined features according to the distribution of each combined features in the corpus. Finally, we use the constituent-based syntactic features, the dependency-based syntactic features and the selected combined feature for classifications in SSP. Experiments show that the proposed method achieves better results on both gold-standard and automatic syntactic parsing.(3) Further, we propose a combined machine learning method which is to improve SSP from the perspective of optimizing machine learning method. The proposed method is based on the above mentioned multiple syntactic features. It adopts five basic machine learning methods: K-Nearest Neighbor, Decision Tree, Perceptron, Maximum Entropy, and Support Vector Machine. We construct the five classification model using the five machine learning methods on the training corpus as the basic unit of the combined model. Then we use an input-dependent gating system to integrate the five basic classification models, and control the output of the combined model by adjusting the parameters of the gating system. Finally, we use Expectation Maximization algorithm to learn the parameters of the gating system using training data, and experimental results show that the method can significantly improve the effect of Chinese SSP.(4) At last, this paper proposes an exploratory computational cognitive model-based Chinese SSP method. On basis of cognitive theory, the method simulates the language understanding process of human and then explores semantic analysis and calculations from fundamental properties. First we define propositional semantic representation oriented to the cognitive model and the Text-to-Scene conversion. The propositions can simply and efficiently express the semantics of natural language. We take the propostions as the neurons of the cognitive model. Then the contextually appropriate propositions will be gradually strengthened and inappropriate ones will be inhibited through iteratively spreading activations until the network stabilizes. Finally, the result of SSP can be achieved according to the activated propostions in the cognitive model.
【Key words】 shallow semantic parsing; semantic role labeling; natural language processing; text-to-scene; computational cognitive model;