

A Study of Chinese Textual Entailment

【作者】 倪盛俭

【导师】 姬东鸿;

【作者基本信息】 武汉大学 , 语言学及应用语言学, 2013, 博士

【摘要】 文本蕴涵研究的主要任务是文本蕴涵识别,在自然语言处理中有很多应用。但现有文本蕴涵识别效果离大规模应用还有一定距离。一个重要原因是文本蕴涵及其识别涉及的语言理据和方法还有待进一步深挖或改进,这正是本文要做的主要工作。研究内容方面,现有汉语文本蕴涵理据的研究主要集中在词语关系、句法转换等。本研究尝试利用语义-文本理论(Meaning-text theory)的词汇函数完善对词语关系的表达和分析,特别是将意象图式作为文本蕴涵的理据,用来表示文本蕴涵识别所需的语言知识信息。文本蕴涵的识别方法方面,现有的句法依存分析不足以揭示文本蕴涵所需的语言信息。本研究采用概念依存分析来完善句法依存分析。但概念依存分析也无法挖掘文本蕴涵背后的所有语言理据,如元语言功能、抽象概念与具体言语表达间的对应关系。词汇函数可用来描述这些理据。意象图式之所以能成为文本蕴涵的理据,是因为其具有理想性、规约性、可预测性。对常识加以规约化,可以提高文本蕴涵识别中语料的覆盖率。词汇函数和意象图式都有规约化常识的功能,两者在规约化常识方面具有互补性。文本蕴涵识别作为文本推理、理解的主要过程,涉及各种认知机制,如概念整合、隐喻、转喻等。针对这些问题,不是单一分析方法就能明确所有的理据,也不是单一理论就能够解释所有的现象。为此,本文采取了综合的方法和多角度的解释。比如,利用词汇函数来填补基于图式映射的概念依存分析的许多空白,除了概念整合理论外,还利用其它一些理论如默认理论、关联理论、顺应理论等来解释汉语文本蕴涵识别中涉及的有关问题和现象。全文分九章,主要内容或观点如下:第一章是绪论,说明文本蕴涵的概念、选题缘由、研究现状、内容、目的和意义,并介绍本研究主要理论背景、方法和资源。第二章对文本蕴涵的类型加以划分和界定,并讨论文本语义蕴涵识别方法及其识别过程中涉及的理据。本章主要利用框架依存分析和词汇函数对语义蕴涵进行分析。分析结果显示,概念依存分析可以有效识别文本语义蕴涵;概念依存分析与词汇函数在文本蕴涵识别中具有互补性;转喻不一定都基于意象图式。第三章研究文本语义预设的识别和涉及的理据。分析表明,意象图式在文本语义预设识别中有重要作用,概念依存分析的根本理念或操作是从概念结构到具体语句间的图式投射,而不是体现在对具体文本内部语义关系的分析。第四章研究文本规约会话含义的识别和理据并探讨文本蕴涵识别中常识的规约化问题。分析表明:(1)文本规约会话含义最能体现各类意象图式和概念依存分析的作用,特别是框架和框架依存分析的作用;(2)以意象图式为理据的文本蕴涵识别过程中,基于意象图式的压缩有效地扩展了概念整合理论有关关键关系压缩的范围;(3)概念依存分析与概念整合间有着密切的联系;(4)抽象、元语言性概念与具体、体验性表达间的对应蕴涵关系的识别是对基于概念依存分析、词语关系、句法转换等文本蕴涵识别方法的补充。如何做好元语言性概念与体验性表达之间的衔接是完善文本蕴涵识别的重要任务之一。第五章讨论文本结果蕴涵的识别和语言理据。分析表明,脚本在文本结果蕴涵的识别中扮演重要的角色;句法和词语关系也能体现因果关系,也能成为文本结果蕴涵的理据。第六章基于前面的研究和对语料的梳理,初步讨论面向自然语言处理的汉语文本蕴涵识别有关资源建设问题。不同资源的构建必须考虑语言作为一个整体系统的特点。所有资源的构建,其目标是一致的:为自然语言处理领域的汉语文本蕴涵识别服务。这就要求不同资源的构建,对语料的覆盖上既要做到全面,又要避免过多交叉,达到合适的离散性。另外,不同资料间尽量避免冲突,如果无法避免冲突,也需提供解决冲突的机制。由于构建的资源是面向自然语言处理的,因而所有意象图式库必须是机器可读的,这就需要计算机专家和人才的参与。第七章是本研究应用举例。汉语文本蕴涵识别研究的应用,既有比前面文本蕴涵识别研究简单的地方,也有复杂的地方。本章绝大部分的例子都取白汉语水平考试试题,同时体现了本研究在白然语言处理领域汉语文本蕴涵识别和汉语作为外语教学中潜在的应用价值。第八章讨论本研究中遇到一些重要问题。包括:文本蕴涵识别的难度、概率、文本语义预设的可取消性和投射性、文本蕴涵识别涉及的隐喻和转喻等。第九章是结论,对本研究加以总结并指出下一步可能要进行的工作。

【Abstract】 The main task of textual entailment is recognition of textual entailment which has a lot of application in natural language processing. However, there is still a great distance between the effects of existing study of recognition of textual entailment and scale applications. One of the most important reasons is that motivations behind textual entailments need further excavation and methods used in recognition of textual entailment need further improvement, which is the main work of this paper.The contents of existing study of textual entailment motivations mainly concentrated on word relations and syntax transformation. Word relations as textual entailment motivations are to be extended in this paper by utilizing lexical functions from Meaning-Text Theory. And especially, image schemata will be utilized as motivations behind textual entailments to express knowledge needed in recognition of textual entailments.As far as methods of recognition of textual entailment is concerned, existing syntax dependency analysis is not enough to uncover language knowledge needed in textual entailments, which is improved by introducing conceptual dependency analysis in this paper. Nevertheless, it is found that conceptual dependency analysis cannot undertake everything in excavating linguistic motivations behind textual entailment, e.g., correspondences between meta-language functions, abstract concepts and concrete expressions cannot be excavated with conceptual dependency analysis, which can be made up by lexical functions effectively.It is found that the reason for image schemata to become motivations behind textual entailment is that image schemata are idealized, conventionalized, and predictable. Conventionalization of common sense can improve corpus coverage in recognition of textual entailment. Both lexical functions and image schemata own the function of conventionalizing common sense and they are complementary in this function. Recognition of textual entailment, as the main procedure of textual inference and understanding, involves all kinds of cognitive mechanisms, such as conceptual integration, metonymy, and metaphor. All these problems can not be solved or explained by a single method or theory. For this reason, comprehensive methodology and multiple angles of explanation are applied in this paper. E.g., lexical functions are used to fill the gap left by conceptual dependency analysis; besides theory of conceptual integration, other theories, such as Default Theory, Relevance Theory, and Adaptation Theory are also used to explain problems and phenomena in the recognition of Chinese textual entailment.This paper has nine chapters with the following contents or viewpoints:Chapter1is introduction, explaining the concept of textual entailment, the reasons for choosing the topic, research status, and contents, purposes, significance of this study. Theoretical backgrounds, methods, and resources to be used in the study are also introduced.Chapter2classifies and defines textual entailments; and discusses the recognition ways of and motivations behind textual semantic entailment. Frame dependency analysis and lexical functions are the main measures utilized in the recognition of textual semantic entailment. The results of analysis shows that frame dependency analysis is effective in recognizing textual semantic entailment; conceptual dependency analysis and lexical functions are complementary in recognition of textual entailment; and metonymy is not necessarily based on Image schemata.Chapter3studies the recognition of textual semantic presupposition and motivations behind it. It is shown that image schemata play an important role in recognizing textual semantic presupposition, and the fundamental idea or operation of conceptual dependency analysis is schematic mapping from an image schema to a concrete sentence but not the analysis of interior semantic relations of a concrete sentence.Chapter4discusses the recognition of and motivations behind textual conventional conversational implicature; and the problem of conventionalization of common sense in recognition of textual entailment. Analysis shows that:(1) textual conventional conversational implicature can best embody the usefulness of image schemata and conceptual dependency analysis, especially the function of frame dependency analysis;(2) during the recognition processes of those textual entailments with image schemata as their motivations, compressions based on image schemata effectively expand the domain of compression of key relations in theory of conceptual integration;(3) conceptual dependency analysis is closely related to conceptual integration;(4) the recognition of the correspondent entailment between abstract, meta-language concepts and concrete, embodied expressions are complementary with recognition of textual entailment uncovered by conceptual dependency analysis, word relations, and syntax transformation. How to establish the links between meta-language concepts and embodied expressions is one of the important tasks for improving recognition of textual entailment. Chapter5discusses the recognition of and linguistic motivations behind textual resultative entailment. Analysis shows that scripts as motivations play an important role in the recognition of textual resultative entailment, and as syntax and special word relations can also express cause-effect relations, they can be motivations of textual resultative entailment.Based on the above study and carding of the corpus, Chapter6tentatively discusses construction of resources for recognition of textual entailment in natural language processing. While constructing these resources, characteristics of language as a united system should be considered and all the resources share the same purpose: serve recognition of textual entailment in natural language processing. And thus it is required that while constructing these resources, it should be tried to cover all the corpus, avoiding overlapping or conflicts among different resources to ensure discreteness, and if there are conflicts that cannot be avoided, mechanisms should be offered to settle the conflicts. As these resources are natural language processing oriented, all the image schema resources must be machine readable, which means that experts in computing are needed in resource construction.Chapter7offers examples of application of this study. Application of recognition of textual entailment has both simpler and more complex places than the study of recognition of textual entailment above. Most of the examples come from Chinese Proficiency Test, which embodies the application potential of the study above both in recognition of textual entailment in natural language processing and teaching Chinese as a second language.Chapter8explains some problems involved in this study. There are many problems appearing during the process of this study and only some of which the author thought to have certain depth of understanding are chosen to be discussed with certain detail and these problems discussed are:the difficulty and probabilistic of recognition of textual entailment, cancellability and projectivity of textual semantic entailment, and metaphor and metonymy involved in recognition of textual entailment.Chapter9is the conclusion, summarizing this study and pointing out the possible future work.

  • 【网络出版投稿人】 武汉大学
  • 【网络出版年期】2014年 05期

