

The Research about the Consraint Rules of Syntax Relation in Cross-Punctuation Sentence in Written Mordern Chinese

【作者】 张瑞朋

【导师】 宋柔;

【作者基本信息】 北京语言大学 , 语言学及应用语言学, 2007, 博士

【摘要】 目前,汉语的句法分析研究基本上以单句为对象,但在真实语料中,汉语单句边界的自动确定是很困难的。在句子层面上,主要的形式标记是标点。计算机处理汉语的前提是汉语的形式化,因此标点句自然而然就成了计算机处理汉语句子的基本单位。标点句的边界是清楚的,但很多标点句的句法成分不完整,需要到上下文语境中去寻找。但跨标点句的句法分析问题尚无系统性方法,这就使得汉语长句分析和长句生成效果很差,并已经成为汉外机器翻译和汉语理解等深层次汉语处理应用系统的瓶颈。为了解决这个问题,首先要对汉语跨标点句的句法关系作一番仔细的调查分析,总结出一些规律和约束条件。本项工作是在跨标点句句法关系的理论框架下展开的,主要目的是解决跨标点句共享成分的识别问题,找出这类句法关系在满足栈形规律之外,还应满足哪些可以形式化的约束规则,以便计算机处理。本文的工作包括两方面:(1)语料库标注、调查和统计。标注了钱钟书《围城》全文,共计22,6641字,2,4115个标点句。标注内容包括跨标点句间的句法关系类型、共享成分、标点句的内部的浅层句法结构,从中得到了标记语料中各种跨标点句句法关系的统计数据。笔者还借助文本检索工具对数千万字的中国现代小说、当代小说进行了多项专门调查和统计。(2)约束条件挖掘。在标注语料和专项调查的基础上,分列大小一百多方面总结出跨标点句句法关系发生的各种约束条件。重点研究原配句和续配句同源并且是正序关系的情况。涉及的内容包括:名词或代词开始的标点句主语是否缺失。主动宾结构的标点句,续配句主语是原配句主语还是宾语。其中讨论了原配句为感知动词句、“有”字句、句宾动词句、连动结构、“像”字句、“V着”句、“V完”句的情况以及一些关联词、副词、形容词、名词对于共享成分的影响。续配句共享原配句状语的认定,涉及多种形式的状语,专章讨论了否定词的跨标点句管辖的判断。续配句共享原配句定语的认定,涉及量词、形容词、代词、名词和名词短语的情况。原配句是把字句、被字句时,句内成分被共享的情况。“跟”与“和”连接的名词短语被续配句整体共享或部分共享的区分。原配句是兼语句时,句内成分被共享的情况。本文的工作在如下方面是有特色的:(1)研究范围方面,除了前人已有的研究跨越标点句的主谓关系之外,还研究了跨越标点句的定中关系、状中关系、述宾关系、述补关系、介宾关系等,全面铺开了跨标点句的句法体系的研究。(2)研究角度方面,侧重于约束条件中的形式化特征,研究成果具有较强的可操作性,为计算机自动进行跨标点句句法关系的分析打下了一定的基础。(3)研究方法方面,不满足于举例说明。除了使用传统的自省方法,寻找语言规律的认知理据外,重视真实语料的语言现象统计,以统计数据作为规律可靠性的佐证。本文的创新性主要表现在语言特征的多角度的深入挖掘方面。择要列举如下:原配句是主动宾结构的情况下,关于缺主语的续配句共享原配句主语还是宾语,本文指出了几种重要的区别特征:指出区别主语话题与宾语话题的主要标志之一是静态句、动态句,从形式上界定了这两种标点句,指出了这两种标点句同主语话题和宾语话题的关系。根据动词对施事、受事的影响,把动词划分为只对施事产生影响的动词和对施事、受事都产生影响的动词,用以区别主语是否转换。提出信息量的概念,指出原配句是“有”字句以及续配句是中间态形容词谓语句时,续配句的主语确定同原配句宾语的信息量有关,宾语信息量越小,宾语作为续配句主语的可能性越大。把标点句分为独立标点句和不独立标点句,用于解决标点句之间是否发生共享关系。把名词从总体上分为独立名词和不独立名词,用于判断标点句的完整与否。对于一些主-副型的连动谓语句,本文采用句型变换的方法归结为主动宾型的单谓语句,再决定续配句的主语认定问题。把动词和形容词作谓语的情况总体划分为方向性谓语和非方向性谓语,用于解决并列名词短语被整体共享还是部分共享的问题。把副词、时间词等状语总体划分为句子状语和词语状语,用于解决状语成分是否被共享的问题。对于各种词性的词语从语义角度进行了细致的分类,用于解决跨标点句共享成分的确定问题。这些词类多数曾散见于多种语言学文献中,但界定方法和使用目标不同,有些是本文首次提出的。本文将这些词类综合使用,有些进行了重新界定,并在高频词范围内给出了这些词类的词表。其中包括:动词词类:存现动词、准存现动词、感官动词、关系动词、认知动词、心理动词、行为动词、使令动词、身体行为动词;名词词类:器官名词(部件名词)、属性名词、亲属名词、心理名词;形容词词类:动态形容词、静态形容词、中间态形容词;副词词类:短暂动作副词、心理副词、情态副词、时间副词、关联副词、评注性副词、范围副词、程度副词等;提出了心理词的概念,包括心理名词、心理动词、心理形容词、心理副词。其中本文首次提出的词类有:中间态形容词、短暂动作副词、心理词、心理副词、心理形容词。语言学文献中出现过,但界定方法和范畴不同的有:准存现动词、动态形容词、静态形容词、情态副词。使用平行结构的方法判断成分共享。等等。在跨标点句句法关系领域,本文的工作是相当初步的。由于时间的关系,许多问题还未涉及到,许多问题只是开了一个头。研究成果还比较零乱,系统性不够,更未涉及算法化、程序化的工作。这些工作将在今后逐步展开。

【Abstract】 Currently, the Chinese syntactic analysis is basically targeted at single sentence. However, the border of Chinese single sentence is very difficult to assure automatically in real corpus. The main form tag is punctuation sentence levels. The prerequisite of Chinese language processing is to formalize. So punctuation sentence become the basic units that computer processes Chinese sentence automatically. The border of Punctuation sentence is clear, but the syntactic elements of many of punctuation sentences is incomplete ,and we need to find them in context. But the problem of syntax analysis of inter - punctuation sentence is not systemic .This makes the parsing of Chinese Long Sentences and the generating of long sentences a poor result,and has become the most difficulty of foreign and Chinese machine translation and the deep-rooted understanding of Chinese Processing. To solve this problem, first, we must investigate the syntactic relations of Chinese-punctuation-sentences carefully and summed up some rules and constraints.This work is based on the theory framework of the punctuation sentence. The main purpose is to identify the common element in punctuation sentence, and in order to computer process punctuation sentences expediently ,we need find the formal binding rules besides the stack-type rules in the syntax relation. This work consists of two aspects:(1) mark the corps and make a survey and statistics We Marked the total of Qian Zhongshu’s "WeiCheng", 22, 6641 words and 2,4115 punctuation sentence. The tags include the syntactic relations between punctuation sentence, the common ingredients, the shallow syntactic structure within the punctuation sentence and we gain the statistical data about each kind of punctuation sentence in marked Corpus. I also use text retrieval tools to do some specialized investigations and statistics on modern and contemporary Chinese novel of tens of millions of characters.(2) Mining the constraintsOn the basis of marked corpus and special investigations,we summed up various of constraints of punctuation sentence from about a hundred of big or small aspects . We focus on the punctuation sentences that yuanpei sentence and xupei sentence is homologous and ordinal .The contents include :whether the punctuation sentences whose beginning element is noun or pronoun miss subject.If structure of yuanpei-sentence is subject–verb-object,the subject of xupei-sentence is subject or object in yuanpei-sentence .We discusses these punctuation sentence whose yuanpei-sentence’predicate is sense verb ,”有”,sentence-object verb,two-verb structure,”像”,”V着”,”V完”as well as the affect to common elements of relevance words, adverb, adjective and noun.How to identify the adverbial modifier of xupei-sentence,involving various forms of adverbial. We discuss the domain of negative word in punctuation sentence in a special chapter.How to identify the attribute of xupei-sentence, involving quantifiers, adjectives, pronouns, nouns and noun phrase.If yuanpei-sentence is把sentence and被sentence ,how to identify the common components in sentence.How to identify the overall or part of the noun phrase connected with“跟”in yuanpei-sentence is shared by xupei-sentence.If Yuanpei-sentence is jianyu-sentence, how to identify the common components in sentence.This work is characteristics in the following aspects:(1) About the scope of the study, in addition to previous studies about the subject-predicate punctuation sentence, We also studied the attribute-head punctuation sentence, adverb-head punctuation sentence ,predicate-object punctuation sentence ,predicate-complement punctuation sentence,preposition-object punctuation sentence, spreading completely the syntactic system research of the punctuation sentence.(2) About the research perspective, We focus on the formal features of constraints, so the studying results is convenient to operate, and lay a solid foundation for computer processing automatically.(3) About the research methods, besides examples ,we not only try to find the language cognitive reasons in traditional methods of self-examination, but also focus on the language phenomenon statistics in real Corpus, and look the statistical data as the reliability corroboration of the rules. In this paper, the major innovative features is the deep mining of the language features from many perspective. The main features are given in the following:If Yuanpei-sentence is the structure of“subject-verb-object”,and the xupei-sentence lack of subject,how to identify the xupei-sentence uses the subject or object of yuanpei-sentence .Tthe paper pointed out several important differences features:To identify the subjects topic and the topic of object, one of the main indicators is static sentence and dynamic sentence ,and formally defined both punctuation sentence, pointing out the relation about the two kinds of punctuate sentences with the subject topic and the object topic.According to the affect of verbs to agentive nouns , verb is divided into the verbs only impacting on Agent nouns and verbs which will have an impact on the patient nouns to distinguish whether the subject convert or not.Put forward the concept of information ,and point out if yuanpei-sentence is“有”sentence and or xupei-sentence‘s predicate is middle-state adjective phrase,the confirming of the subject of xupei-sentence has relation to informativity of the object in yuanpei-sentence . The smaller the informativity of object is, the more likelihood object is the subject of xupei-sentence. We divided punctuation sentence into independent punctuation sentence and dependent punctuation sentence to judge whether the two punctuation sentences has relation .with each other.We divided nouns into independent and non-independent nouns overall to judge whether the punctuation sentence is integrated or not.For the punctuation sentence whose predicate has two vebs and has the relation of main-Vice , the paper used Sentence transform method to attribute them to single predicate sentence which is subject-verb-object and then confirm the subject of xupei-sentence.We divide verbs and adjectives predicate overall into directional predicate and non-directional predicate to settle the question whether overall parallel noun phrase is used or part or them is used.Put adverbial modifier into sentences adverbial modifier and lexical adverbial modifier to judge whether the adverbial modifier is shared.The above concepts and classifications were introduced for the first time in this paper.Make detailed classification to many words of each POS from semantic to resolve the confirming of common components in cross-sentence punctuation. Many of these parts has appeared in much linguistics literature, but the methods to define them and the purpose is different, Some of this is put forward the first time. This paper will use these word classes synthetically, and some have been redefined, and we given word list within high-frequency words. These include:Verb classes:existential-presentative verbs,pre- existential-presentative verbs,sensory verbs,cognitive verbs,mental verbs,motion-verbs,command verbs,body-motion verbs.Nouns classes :organ nouns,attribute nouns,family nouns,mental nouns; Adjective classes :dynamic adjective,static adjective,middle adjective; Adverb classes: momently-motion adverb,mental adverb,modal adverb,time adverb,conjunction adverb,scope adverb,extend adverb and so on; Put forward the concept of mental words,including mental nouns, mental verbs,mental adjective,mental adverb.The words classes put forward the first time in the paper are:organ nouns,middle adjective,momently adverb,mental adverb,mental nouns,mental wordsThe words classes which appear in linguistics literature but the the method of defined and domain is different sre: pre-existential-presentative verbs,body-action verb,dynamic adjective,static adjective.We also use parallel structure to settle the question.And so on. This work is very preliminary in the field of syntax relation of punctuation sentence. Due to time constraint, many of the issues are not mentioned, many of the problems have only the first step. Research results are more chaotic, not much systemic, not covered algorithm, the procedures. These will be gradually carried out in the future.

  • 【分类号】H146
  • 【被引频次】6
  • 【下载频次】533

