节点文献

汉语文语转换系统HJ-TTS关键技术的研究与实现

The Study and Implementation of Some Key Techniques in the Chinese Text-to-Speech System

【作者】 张大军

【导师】 陈肇雄; 黄河燕;

【作者基本信息】 中国科学院研究生院(计算技术研究所) , 计算机应用技术, 2000, 博士

【摘要】 文语转换系统(Text-to-Speech)是人-机交互接口的重要组成部分,也是中文信息处理中的一个难题。本文以华建语音翻译系统为研究背景,对文语转换系统的语言学处理和韵律表述两方面进行了深入研究和探讨。 文语转换的目的是将计算机内存储的文本自动转换为声音输出,这项技术已经随着语音合成的发展逐步走上实用,它在信息发布系统、语音自动应答系统、语音电子邮件系统、残疾人语音服务等领域具有广阔的应用前景。它的研究对于人机语音通讯、自然语言人机接口和智能计算机系统的研制,都具有十分重要的理论意义和实用价值。 随着计算语言学和语音学的发展,文语转换技术取得了长足发展。它从仅仅对语音信号做浅层分析发展到对语言学和语音学知识的综合运用:从机械式的语音合成发展到利用计算机进行语音输出。尽管如此,目前文语转换技术仍有许多不尽人意的方面,这主要表现在:系统的语言学处理部分还不能对文本做深层的语义分析,不能在文本理解层面上给后期合成提供必要的信息:语音的音段特征难以提取和定量分析;合成语音的评测标准和方法还不完善等。 为了改善文语转换系统的性能,作者对系统的语言学处理部分进行了详细阐述和分析:丰富并改进了句子边界确定、特殊符号处理等规则:根据文语转换系统中分词的特点,实现了基于特征集合的语音分词算法和分词消歧策略;针对多音字发音描述问题,作者改进了词库结构和多音字存储方式,并在此基础上设计了多音字筛选算法。 呼吸群边界划分作为汉语韵律的一种表现方式,它在语音感知和文语转换系统的性能方面起着十分重要的作用,作者在对汉语句法结构、句子长度和和呼吸群划分之间关系的研究基础上,设计并实现了基于句子长度和句法结构的呼吸群边界划分算法;为更好地描述文语转换过程,作者提出了适合汉语语音合成的SSML韵律标注语言,该标注语言不仅从声学层上对语音的韵律进行描述,而且从语言学层上标注句法结构信息,最后本文对基于语言理解的韵律规则用SSML进行了描述并用实验验证了语言学层面的标注对系统自然度的影响。

【Abstract】 Text-to-Speech (TTS) is an important part of human-computer interface and it is also a difficult problem in the Chinese information processing. In this dissertation we conclude our research work on the language processing and prosodic expression of a Text-to-Speech system which acts as a component of Huajian Speech-to-Speech translation system.The Text-to-Speech system is designed to convert the text stored in the computer into speech. With the improvement of speech synthesis techniques, it has been widely used in our life such as information systems, voice response devices, voice services in E-mail, reading machines for blinds and so on. Moreover, it has great theoretic value in human-computer communication system, natural language human-computer interface and intelligent computer system.Text-to-Speech has made great progress with the development of computing linguistics and phonetics. It has evolved from analyzing speech signals to utilizing the knowledge of linguistics and phonetics. In spite of these, Text-to-Speech technology still has many weaknesses: Firstly, it can’t provide enough information to the synthesis without deep semantic level analysis; Secondly, the segmental characteristic is difficult to extract and analyze; Finally, the evaluating criterions are far from perfect.The procedure of language processing is discussed and analyzed in this dissertation firstly, then the author enriches the rule sets of sentence boundary assigning, specific symbol processing and so on. Combining the characteristics of Chinese Text-to-Speech system, a new algorithm of speech word segmentation and a method of disambiguition are also put forth. The word-to-sound rule and the structure of a speech database are modified during the

节点文献中: 

本文链接的文献网络图示:

本文的引文网络