节点文献
现代汉语形容词概念语义模型研究
A Conceptual Model of Modern Chinese Adjectives
【作者】 李学宁;
【导师】 陆汝占;
【作者基本信息】 上海交通大学 , 计算机软件与理论, 2008, 博士
【摘要】 在自然语言理解和计算语言学中,热点之一是形容词修饰语研究,它指的是充当定语修饰名词的形容词。经典的方法是把形容词和名词都处理为谓词,形名语义的组合通过叠置原理来进行计算。尽管与数理逻辑清晰思路相吻合,但所存在的缺陷是难以准确地刻画形容词修饰不同名词时候的语义特征变化。因为形容词在汉语形名复合结构中所指称的是客观实体所具有的特征值,所以整个形名复合结构的语义分析能否在概念层次上准确地表征这些特征(属性名称及其值),是保证实体内涵概念组成完整性的关键之一,直接关系到基于概念图表示方法的信息检索中用户需求的完整性,从而影响到提高检索准确率。因此研究具有理论意义和实际应用价值。本文有关成果提供给对外汉语教学。本文研究是“内涵逻辑”理论研究及应用的一个组成部分,与名词概念语义模型、动词概念语义模型相结合,旨在应用于信息智能检索、数字图书馆等方面。本文研究了现代汉语的形容词修饰语,包括两个方面的主要内容:概念语义模型及其运用。语言语义模型是词语结构形式与语义之间对应关系的描述。模型正确解释了形容词作为特征值与特征(属性名)之间一与多的联结关系:多义性表征。手工标注的一定数量的实例特征及词典自动提取特征证实该模型的有效性。基于模型的语义关系解释,尝试和探索了计算可行性:基于词典的特征库、词典释义项中形容词概念属性自动提取,有关概念属性的同义词、反义词自动抽取以及改进对外汉语学习词典编撰。首先,研究了汉语中形容词多义性的表征。从四种多义性表征理论中,总结出了两种主要的语义分析方法:语义关系分析法和语义特征分析法。通过考察《现代汉语规范词典》中127个常用形容词的释义体系,发现常见的表征方式是同义词、反义词加相关特征。在此基础上,提出了一个“实体——特征(属性名)——值”相联结的语义模型。通常在‘AVS’系统中,特征所联结的是概念,并且一个值只能够与一个特征相联结。而在我们模型中,一个值与多个特征相联结。并且,这些特征基本上可由词典编纂者所提供。在数量上,能够解释一定数量的形名组合的情况。语义模型所解释的每一个形容词的具体的值和特征(属性名)是不同的。由于手工建构每一个形容词的语义模型费时费力,本文随机标注了小部分高频形容词的释义,以获取自动抽取的模板。此后,使用其余的形容词作测试结果,获得比较理想的准确率和召回率。从实验结果来看,从现有的文本词典中通过模板抽取的方法来自动生成形容词概念模型是可行的。从词典学的角度来看,所抽取的形容词语义模型在有关信息的呈现方面与原来的《现代汉语规范词典》相比具有了一些新的特点。首先,它把与词条相关的同义词、反义词和特征全部收录了。而在原来的词典中,一些同义词、反义词和特征分散于其他词条的信息中。因此,不利于学生,尤其是外国学生查找。此外,能够基于所联结的特征、实体和其余相关的值等几个方面简明地区分同义词、反义词的区别与联系。这样便于外国学生正确使用现代汉语形容词。方法论上重视汉语语言现象分析及汉语本体理论研究。对现代汉语中127个高频形容词及其相关的数千个形名组合结构进行了枚举式的实例分析。运用内涵逻辑的分析方法和当代认知语言学有关理论。本文研究的创新点主要在如下几方面:一、提出“实体——特征——值”的概念语义模型,表征了形容词的多义性。在经典的AVS模型中,人们研究了典型性、语境性和否定性等问题。他们的研究前提都是把形容词处理为单义词,而这不吻合自然语言的实际情况。二、在特征的设置上,比较成功地解决了特征的相对性问题。语义特征是语义知识库的一个重要组成部分,特征的界定和设置密切关系到计算机自动获取特征的准确率、召回率,但是,国内HowNet、CCD、《现代汉语语义词典》等各家设置的特征在数量和命名上不尽相同。本文考察了语义学中的经典理论、原型理论和关系理论,发现它们所提出的特征之间具有连续统关系。本文避开了先对名词进行本体上分类的做法,而是基于《现代汉语规范词典》中形容词的释义方式——同义词、反义词加特征。三、尝试把模型运用于对外汉语教学中。它起到了充当形容词同义词、反义词学习词典的作用。四、探索文本内向词典自动转换为电子学习词典的可行性。在计算语言学、自然语言理解领域,有两个基本的问题:‘如何算’与‘算什么’。前者关注算法的改进,一般采用统计分类的方法。后者的重点是建立语言模型,往往需要对一些基本的语义现象先进行手工分类。由于汉语形容词语义的复杂性,本文采用了第二种方法。在建模过程中,重点解决了特征的界定、分类等一些基本语义问题,为后续的自动抽取工作奠定了必要的语言学基础。
【Abstract】 In NLP and Computational Linguistics, it is a hot topic to study adjectival modification, i.e. the adjective in attributive use. According to the classical theory, the adjective and the noun are treated as the predicates, and the computation of the combination of the adjective and the noun is based on the Compositionality Principle. Although it is in coincidence with Mathematical Logic, the fault is that it is difficult to describe the shade difference in meaning when the adjective is used to modify different nouns. Because the adjective is understood as the value of the entity in the objective world in the Chinese A+N combinations, whether the semantic analysis of the whole nominal compound can represent these attributes and the values at the conceptual level is critical for the unity of the intension of the concept, and is related with the unity of the user’s need in information retrieval which is represented in the conceptual graph and the accuracy of the retrieval. Consequently, it is related with the accuracy of information retrieval. Therefore, the research is of significance in theory and application. The paper is applied in the Teaching Chinese as Foreign Language. The paper is a part of the‘Intensional Logic’theory and its application, and can be applied in the filed of information retrieval, digital library and so on.The paper has a research into adjectival modification in Mandarin, including the conceptual model and its application. The model describes the corresponding relationship between the form and the meaning of the words. The model explains polysemy as a multi-connection relationship between the value standing for the adjective and the attributes. The model proves valid through a number of examples tagged manually and the auto-extraction of the attributes through the dictionary. Based on the semantic interpretation of the model, the paper has a research into the computability based on the attribute bank of the dictionary and the attribute extraction of conceptual attributes and synonyms and antonyms, which will be used in the compiling of CFL.First, the paper has a research into the representation of polysemy of Chinese adjectives. From the four theories to represent polysemy, we conclude two major methods, namely the approach of semantic relation and the approach of semantic attributes. Based on a statistical study of the definitions of 127 adjectives in Modern Chinese Standard Dictionary (MCSD), a common way to represent the adjective is the synonyms and antonyms tagged by corresponding attributes. Therefore, we propose a model of“Entity– Attribute - Value”, where a value is connected with different attributes, the names of which are given by the dictionary compilers. Besides, the attributes are enough to explain a lot of A + N combinations. This model is different from the common AVS, where the attribute is connected with the concept and one value is connected with one attribute. To a given adjective, the attribute and the value are different in value and attribute. Because it is time-consuming to construct every model manually, the paper annotates randomly a small number of the adjective of high frequency to get patterns for extraction. Consequently, we use the remaining as the test bed and get satisfactory accuracy and recall. From the experiments, it is feasible to generate the adjectival conceptual model from the text dictionary.From the point of view of computational lexicography, the extracted adjectival model differs from MCSD in information display. First, it collected all the synonyms and antonyms and attributes, which are scattered among other entries in MCSD. Therefore, it is not easy fro students especially foreign students to consult. Besides, it becomes easy to distinguish synonyms and antonyms in the aspects of the attributes, entities and other attributes connected, which is useful for foreign students to use language.In methodology, the paper pays attention to the research into the Chinese language and linguistics. It has a statistical analysis of 127 adjectives of high frequency and corresponding thousands of A+N compounds. With the application of the‘Intensional logic’theory and cognitive theories, the paper is creative in following aspects.1. It proposes the conceptual model:“Entity– Attribute - Value”to represent polysemy of adjectival modification. In previous AVS models, people are concerned with semantics such as prototype, context, and negation. A common default is the treatment of the adjective as a word having one sense, which is not against natural language.2. It handles successfully the relativity of attributes construction. The semantic attributes are an important part of the semantic knowledge base. Their definition and the construction are closely related with the counting of automatic extraction and its accuracy. However, there is a discrepancy among the attribute databases of HowNet, CCD and SKCC in both number and naming. The paper has a research into the Classical theory, the Prototype theory and the Relational theory, and first discovers a continuum among different attributes. In the paper, we do not first construct a nominal ontology. Based on the representation of adjectival modification in MCSD, we extract the attributes given by compilers directly.3. The model is applied in Teaching Chinese as Foreign Language. It serves as a pedagogical dictionary of adjectival synonyms and antonyms.4. The paper researches into the auto-conversion of the text dictionary into the pedagogical dictionary.In Computational linguistics, there are two approaches. One is to improve the algorithms with the acceptance of the linguistic theories, especially western theories. We have adopted the second, i.e. the proposal of an adjectival model, which will be realized by the compter techniques. This is also a very important approach.
【Key words】 Adjectives; Conceptual model; Dictionary; CFL; Intensional Logic;