节点文献
基于数据库语义学的古汉语句法语义分析研究
Syntactic-Semantic Analysis of Ancient Chinese Based on Database Semantics
【作者】 冯秋香;
【导师】 汪榕培;
【作者基本信息】 大连理工大学 , 计算机应用技术, 2012, 博士
【摘要】 中文自然语言处理多以现代汉语为研究对象,但古汉语作为中国传统文化的重要载体,其自动句法和语义分析研究对于中国传统文化的西传具有重要意义。关于古汉语的自动句法和语义分析研究也会促进现代汉语自动分析的发展和进步。现代汉语的自动句法和语义分析取得了很大进步,但当前通用语法和句法分析器在应用于古汉语句法分析和语义表示时存在一定不足。本文以数据库语义学为理论框架,以其重要组成部分-—结合语法为技术支持,基于时间线性原则和描述性原则,以《左传》及其英译本为语料来分析古汉语的基本句法和语义关系。本文的研究工作主要分为如下几个方面:第一,根据数据库语义学句法分析的具体要求,以《左传》及其英译本为语料来源,建立一部双语词典。词条的存储形式为“命题粒”,“命题粒”是一种非递归特征结构,是一个属性值对的集合。词的语法信息和语义信息分别作为其特征结构中相应属性的值进行详细标注,满足基于数据库语义学和左结合语法句法语义分析需要的同时,从词的层次上降低由一词多义和词类活用引起的歧义发生率。第二,从自然语言处理的角度出发,对《左传》的语法特点、句式结构重新进行梳理,并结合基于左结合语法的自动句法分析的需要,总结归纳基本句法规则,从名词、动词、形容词三大词类的基本用法出发,研究和分析了主谓结构、联合结构、动宾结构等基本结构,以及宾语前置、语义被动、形式被动和成分省略等基本结构的变体形式。第三,在虚词处理方面,提出有条件移植法。在实词吸收虚词的过程中,有条件保留虚词的核心属性值和/或语义属性值。这和数据库语义学原来的处理方法不同,能够避免语言生成和机器翻译过程中可能出现的大量回溯。第四,在句法分析过程中透过表层结构挖掘语言内容,分析深层次的语义关系和语用内涵,通过规则操作的方法为词的语义属性添加新值,以表现其施事、受事、历事、修辞、被动等语义角色和语用功能。我们期待基于改进的数据库语义学的自动句法语义分析方法能够在今后应用于其他大规模语料的研究和分析上,比如和《左传》生成年代不同且具有不同语法特征的古汉语文本。另外,基于本文研究基础上的语言生成和机器翻译也是我们后续研究的方向之一。
【Abstract】 The current natural language processing of Chinese is mainly concerned with modern Chinese. However, ancient Chinese is also a carrier of traditional Chinese culture. Automatic syntactic and semantic analysis of ancient Chinese is significant to the introduction of traditional Chinese culture into the world. The automatic syntactic and semantic analysis of ancient Chinese will also promote that of modern Chinese.Though the automatic syntactic and semantic analysis of modern Chinese has made a lot of progress, the commonly applied grammars and the existing parsers are not satisfactory when applied to ancient Chinese syntactic and semantic analysis. Supported by the theory of Database Semantics, this research is based on Left-Associative Grammar. Following the principle of time-linearity and description, this research is conducted on the basic syntactic and semantic relations in ancient Chinese with the original text and the English translation of Zuo Zhuan as corpus. This research is composed of the following parts.First, we have extracted data from the bilingual corpus, self-made with the original text and the English translation of Zuo Zhuan, to build up a lexicon, which meets the demand of Database Semantics-based syntactic-semantic analysis. In the lexicon, a word is stored as a "proplet" that is a non-recursive feature structure, i.e. a set of attribute-value pairs. Values of the attributes in a proplet represent the lexical information and semantic information of the word. As required by the syntactic and semantic analysis, this kind of data structure helps to reduce ambiguity caused by polysemy and temporary shift of part of speech.Secondly, we have generalized the grammatical features and sentence patterns of Zuo Zhuan from the perspective of natural language processing and composed basic syntactic-semantic rules to facilitate our syntactic and semantic analysis based on Database Semantics and Left-Associative Grammar. Our analysis covers the structures of subject-predicator, coordination, predicator-object, etc, as well as the variants of these basic structures, including object-fronting, semantic passive, formal passive, element omission and so on, within the fundamental application scope of nouns, verbs and adjectives.Thirdly, we have proposed the algorithm of conditional transplantation regarding function words and other words in auxiliary position. In the absorption of a function word by a content word, the core value and/or the semantic value of the function word is maintained under certain conditions. As different from the complete absoption in the original Database Semantics, it helps to avoid possible backtracking to a large extent in later language production and machine translation.Fourthly, we represent language content rather than superficial structures in the derivation. We analyze deep-level semantic relations and pragmatic meanings, which are then represented as new values of the semantic attribute of of the corresponding word. The additional values, such as agent, patient, experiencer. passive, rhetorical, are provided during the rule operations to indicate semantic roles and pragmatic functions.We expect larger-scale application of the automatic syntactic-semantic analysis based on the improved Databse Semantics, for example, to ancient Chinese texts that are produced in a different era and therefore have different features from that of Zuo Zhuan. Language production and machine translation based on this research may also be a focus of our future research.
【Key words】 ancient Chinese; natural language processing; Database Semantics; Left-Associative Grammar; syntactic analysis;