

Study on Several Key Problems in the Machine Translation System Based on Semantic Language

【作者】 关晓薇

【导师】 高庆狮;

【作者基本信息】 大连理工大学 , 计算机应用技术, 2009, 博士

【摘要】 时至今日,全自动高质量的机器翻译系统仍未出现,其原因在于语言知识的复杂性和人类认识语言规律的有限性。目前,机器翻译的研究已不局限于对单个句子的句法语义分析,更要充分挖掘语言内部规律并深入探讨包括句群、段落、篇章、体裁在内的各种语境信息。基于语义语言的多语言机器翻译方法根据语义单元库对源语言句子进行语义分析,并展开为目标语言句子来实现翻译。该机器翻译系统由两部分组成:一是统一的多自然语言机器翻译软件;另一是高质量、可扩充、完备的、无可弃、无重复、无非正常歧义的多语统一语义单元库。但是受所提取语料限制和语言现象复杂多变的影响,现有语义单元库并不完备(语义单元及其表示不全或不正确),无法保证所有句子都能翻译且结果正确。另外目前该系统中仍存在一些难点重点问题亟待解决。为了提高翻译质量,得到正确且无语无伦次的译文,本文针对系统中存在的其中三类关键问题进行了一系列的研究,主要包括如下三个方面:(1)研究了基于语义语言的机器翻译系统中的关键问题之一:量词。首先在统一语言学语义单元理论的基础上,对英、汉名量词进行统一分类,即量词在英语和汉语中的不同语义单元表示对应相同的量词语义类型;然后基于语义单元理论将名量短语形式化成汉语和英语语义单元表;最后利用《知网》的词汇定义对形式量词搭配的名词进行语义表示,并建立形式量词-名词搭配规则库,提出并实现了一种基于规则库和名词实例的系统中汉语形式量词选择方法。(2)研究了基于语义语言的机器翻译系统中的关键问题之二:英语介词的语义消歧和汉译。首先基于语义单元理论并针对含介词的短语和句子特点,构建了介词特有语义单元(表示)库——介词语义模式库。介词的翻译由两步来实现:第一步基于语义模式库对英语含介词的短语和句子进行语义分析得到英语完整语义模式;第二步基于语义模式库将英语完整语义模式代入展开成为汉语完整语义模式和汉语译文。最后在语义分析阶段提出三种介词特有的语义分析方法。第一种方法是基于连接文法和语义模式法:建立含介词的短语和句子与连接因子对应关系,将连接文法分析器对句子识别结果与其相匹配获取句子语法结构。第二种方法是语义模式分解法:将含介词的短语和句子的四种基本形式分解得到介词简单语义模式。第三种方法是语义模式扩展法:从介词开始向左向右逐层扩展动词、名词和形容词得到实际扩展形式。实验结果验证了介词语义模式库和语义分析三种方法对处理基于语义语言的机器翻译系统中的介词语义分析和汉译是有效的。(3)研究了基于语义语言的机器翻译系统中的关键问题之三:汉英时态转换。首先在分析汉英双语句对和归纳汉语句子时间表达规律基础上,从英语16种时态向汉语时间表达的逆向映射关系出发,对汉语句子时间信息进行新分类,该分类方法有效地避免传统时态分类的复杂性和类别交叉等缺点:然后提出时间模式的相关概念,用其对每类时间信息进行形式化,并构建了时间信息模式库;最后提出了一种将汉语单句时间分析算法、汉语关联词语标记句时间分析算法、类虚拟语气句时间分析算法与篇章信息识别规则相结合的多策略汉语句子时间分析和英译方法,为基于语义语言的机器翻译系统提供了一种可行的时态转换方法。

【Abstract】 Up to now,there isn’t a full automatic and high quality machine translation system being developed yet.Its difficulty mainly lies in the complexity of language knowledge and our limited understanding of the rules of human languages.At present the study on machine translation not only is constraint to semantic and syntactic analysis in single sentence,but needs to explore the inherent law of the language and context information like sentences, paragraphs,texts and styles,etc.Multi-language machine translation based on Semantic Language analyzes the source language sentences semantically with reference to the Semantic Element Base,and converts the source language to the form of the target language to realize translation.The machine translation system is composed of two parts:one is a unified multi-language machine translation software;the other is a high-quality,expandable, complete,free-discardable,free-of-repetition and free-of-abnormal-ambiguity multi-language Semantic Element Base.But limited to the corpora and influenced by complex and flexible language phenomena,the present Semantic Element Base is not perfect enough(incomplete and wrong Semantic Elements and their representations) to realize entirely right translation for all the sentences.Furthermore there still exist some difficulties need to be settled in the system.To improve translation quality and get correct translation,a series of researche(?) have been taken to solve three key problems among all and they are mainly carried out in the following three aspects:(1) Study on one of the key problems in the machine translation system based on semantic language:classifiers.Firstly based on the theory of semantic element in unified linguistics,a new unified classification of English and Chinese nominal classifiers was proposed.Different semantic element representations of classifiers in English and Chinese have the same semantic type of classifiers.The noun-classifier phrases were formalized into English and Chinese semantic element representations respectively.Then the nouns collocated with the formalized classifiers were represented semantically by the lexical definition in the HowNet,and a Formalized Classifier-Noun Collocation Rule Base was constructed.Finally a Chinese formalized classifier selection method based on lexical examples and Collocation Rule Base was proposed and realized. (2) Study on one of the key problems in the machine translation system based on semantic language:semantic disambiguation and Chinese translation of English prepositions.Firstly based on the theory of Semantic Element and the characteristics of phrases and sentences with prepositions,the Semantic Pattern Bases of English prepositions(special Semantic Element Representation Bases for prepositions) were presented.Then the translation process of prepositions was in two steps.One was getting English complete semantic pattern by semantic analysis based on Semantic Pattern Base,the other was deploying into Chinese complete semantic pattern and Chinese representation.Finally three semantic analysis methods peculiar to prepositions in semantic analysis step were proposed. The first method was based on Link Grammar and semantic pattern.The parsing results by the Link Grammar Parser were matched with the correspondence between the phrases and sentences with prepositions and the connectors to get the grammatical structure.The second method was based on semantic pattern decomposition.The four basic forms of phrases and sentences with prepositions were decomposed into simple semantic patterns.The third method was based on semantic pattern extending.The verbs,nouns and adjectives were extended around the prepositions to get real extending form.The test results prove that the Semantic Pattern Bases of English prepositions and the three semantic analysis methods are effective on resolving semantic analysis and Chinese translation of English prepositions in the machine translation system based on semantic language.(3) Study on one of the key problems in the machine translation system based on semantic language:Chinese-English temporal transfer.Firstly based on the analysis of bilingual sentence pairs and summarizing the law of temporal expressions,a new classification method of Chinese temporal information was proposed in the light of the reversal mapping from sixteen English tense and aspect to Chinese temporal expressions.This classification method can effectively avoid the drawback of complexity and overlapping of traditional classification.Then the concept of Temporal Pattern was presented to formalize each type of the temporal information,and the Temporal Pattern Base was constructed.Finally the temporal analysis and translation of Chinese sentences in the system were resolved by combining temporal analysis algorithm of Chinese simple sentence,conjunction-marked sentence and analogous subjunctive mood sentence with context rules.It provides a feasible solution for temporal transfer in the machine translation system based on semantic language.


