节点文献

基于语义语言的英汉机器翻译研究

Study on English-Chinese Machine Translation Based on Semantic Language Theory

【作者】 浑洁絮

【导师】 高庆狮; 黄德根;

【作者基本信息】 大连理工大学 , 计算机应用技术, 2011, 博士

【摘要】 机器翻译一直以来都是一个世界难题。语言知识的复杂性和对语言规律认识的有限性是机器翻译的困难所在。人们从机器翻译的探索过程中认识到,要想得到高质量的翻译结果,需要对自然语言的语义进行分析和理解。本文从语义语言理论角度,针对英汉机器翻译中存在的难点问题进行了一系列的研究,主要包括三个方面:(1)容易误解误译的英语的研究。基于大量实例,对容易误解误译的英语词语加以形式化语义分析并进行了实证研究。提出自然语言语义对应中的弯曲现象及语义模块的概念,总结出英汉语义对应中的七种弯曲现象,并对每种弯曲现象下英语误解误译的程度进行了分析比较,总结了民族文化导致弯曲现象发生的九个方面。研究结果表明,英汉语义对应时那些发生弯曲现象的英语表示正好对应于最易误解误译的英语词语。该结论可应用到基于语义语言的机器翻译中以提高机器翻译的正确率。(2)英语隐喻的识别与汉译的研究。提出语义语法模式的概念、提取规则和自动提取语义语法模式的方法。以人体词为研究对象,针对英语人体词隐喻用法的特点,构建了英语人体词特有语义语法模式集、固定搭配集和变量表示库。语义语法模式集包括语法隐喻模式集、词汇隐喻模式集、字面意义模式集、短语模式集、构句模式集等子集。提出了一种基于语义语法模式集、固定搭配集和变量表示库的英语隐喻识别与汉译的合一算法,实验表明,该算法对处理英语隐喻的识别与汉译是有效的。(3)带句法语义变量的模板的提取与应用的研究。提出带句法语义变量的翻译模板及模板自动抽取方法,构建了句法语义类型树,根据句法语义类型树确定翻译模板中变量的句法语义类型,实验表明,使用带句法语义变量的翻译模板,与带语法变量的翻译模板相比,英汉、汉英机器翻译结果的BLEU值分别提高了0.08和0.05。该模板可用于基于语义语言的机器翻译系统中,也可以用于其它EBMT系统中。由于某些带句法语义变量的模版就是语义单元,所以带句法语义变量的模板的提取也可以丰富语义单元库。

【Abstract】 Machine Translation is always a worldwide difficulty. Its difficulty mainly lies in the complexity of language knowledge and our limited understanding of the rules of human languages. In the process of exploring machine translation, we have realized that high quality translation requires semantic analysis and understanding of natural languages. Based on Semantic Language theory, a series of researches have been taken to solve important and difficult problems of English-Chinese machine translation. They are mainly carried out in the following three aspects:(1) Study on the kinds of English words and expressions which are often misunderstood and mistranslated.Based on quantities of examples, a study about the kinds of English words and expressions which are often misunderstood by Chinese people is reported. Concepts of Semantic Module and the Refraction Phenomenon in natural languages are presented. Seven situations of the Refraction Phenomenon between the English language and the Chinese language are concluded. Comparisons of the kinds of English words and expressions are made among the seven situations. Nine cultural factors causing the Refraction Phenomenon are concluded. Experiment results show us that the English maps of the Refraction Phenomenon are the most-easily misunderstood English words and expressions. The conclusion can be applied to the machine translation to improve the correction rate of machine translation.(2) Study on the metaphor recognition and Chinese translation of English words.The concept of Semantic Grammar Pattern, Semantic grammar pattern extraction rules and a serious of semantic grammar pattern extraction methods are presented. Researching on the most frequently-used English Human Body Words, the Variable Representation Base, Set Phrases Set and Semantic Grammar Pattern Set of English Human Body Words are constructed. The Semantic Grammar Pattern Set includs Grammar Metaphor Pattern Set, Lexical Metaphor Pattern Set, Literal Meaning Pattern Set and Semantic Grammar Pattern Set of English Phrases and Sentences, etc.. Based on them, a new metaphor recognition and Chinese translation algorithm was presented. The test results prove that the metaphor recognition and Chinese translation algorithm are effective on resolving metaphor recognition and Chinese translation of English words. (3) Study on the induction and application of translation templates with syntactico-semantic type constraints.The translation template with syntactico-semantic typed variables and a learning technique that inducing translation templates with syntactico-semantic typed variables are presented. The syntactico-semantic types of variables are induced by using syntactico-semantic type trees. The test results prove that using translation templates with syntactico-semantic typed variable, the BLEU value are higher than using translation templates with grammar typed variable in both English-Chinese and Chinese-English machine translation, respectively 0.08 and 0.05. The translation templates with syntactico-semantic typed variables can be implemented as a part of Multi-language machine translation system. And it also can be implemented as a part of other EBMT systems. Since many translation templates with syntactico-semantic typed variables are semantic elements, the extraction of translation templates with syntactico-semantic type constraints is helpful to enrich our semantic element representation base.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络