

A Study on Lhasa Tibetan Prosodic Model of Journalese

【作者】 陈琪

【导师】 于洪志;

【作者基本信息】 西北民族大学 , 计算机应用技术, 2011, 硕士

【摘要】 随着基于大规模语料库波形拼接技术的发展,语音合成系统的研究取得了重大的进展,合成语音的可懂度已经能够满足实际应用的需要。但是,合成语音的自然度依然不够理想,这主要是由于合成系统中的韵律模型还不是很完善。为了消除合成语音与人类自然语流之间的差异,从而合成出具有较高自然度的语音,就必须建立高质量的韵律模型。目前比较流行的方法就是采用数据驱动的方法,通过大量的语料来训练模型,从而使得其可以输出高质量的韵律控制参数,提高合成语音的自然度。人类的口语语流在实现时受到生理机制的制约,其中生理机制主要指的是呼吸调节,呼吸是划分韵律层级的重要线索。研究呼吸与韵律层级之间的联系,确定影响韵律特征的呼吸信号参数,并将其与语音信号参数共同作为训练数据,是一种韵律建模新的处理方法,对于建立高质量韵律模型是一次全新尝试。本文针对藏语语音合成的实际开发需要,采用新闻文本作为训练语料,研究了藏语拉萨话的语音和韵律特性,确定了影响韵律特征的呼吸信号参数,采用RBF神经网络的方法建立了藏语拉萨话新闻体韵律模型,实现了韵律控制参数的预测,主要工作包括:1、研究了藏语拉萨话的语音特性,结合汉语韵律结构的研究成果,确定了藏语的韵律层次,分析了藏语的韵律结构和及其特性,确定了能够反映韵律特征的参数,作为韵律模型的输入参数集。2、收集了一年的《西藏日报》,根据藏语拉萨话的特点,进行了文本设计和优化,使得所选用的语料基本覆盖了拉萨话的音段和超音段特性,经过规范化处理和录音,设计了符合藏语特征的韵律标注规则,建立了藏语拉萨话韵律模型语料库。3、根据人类发声时的生理机制,研究了发声时呼吸信号的变化特点,经过数据分析,确定了呼吸信号和韵律特征之间了对应关系,并采集了相关参数作为模型的训练参数。4、根据之前韵律结构分析研究的结果,确定了反映韵律特征的6组39维语境信息参数,使用RBF神经网络,建立韵律模型,输出参数为10维韵律控制参数。使用语料库中已标注完成的语料对模型进行训练和实验,分析结果可知该模型具有良好的预测性能。

【Abstract】 Based on the fast development of large-scale corpus waveform joining technology, speech synthesis system studying has gained significant progress and the synthesized speech intelligibility has been able to meet the needs of practical applications. However, the naturalness of synthesized speech is still insufficiently ideal, mainly because of deficient prosodic model in synthesis system. High-quality prosodic model must to be established in order to eliminate the difference between synthesized speech and human nature language flow for higher naturalness of speech. At present, the data-driven method is much more popular than others, using lots of corpora to do model training for outputting high-quality prosodic control parameters and improving the speech synthesized naturalness.To realize the oral language flow is limited by the human physiological mechanism, mainly referring to respiratory regulation. Respiration is as an important clue for prosodic layer classification. Research on the interrelationship between respiration and prosodic layers, to confirm the respiratory signal parameters of prosodic features and regard them as training parameters, is considered as a new type of prosodic model processing and a new attempt for establishing high-quality prosodic model.In accordance with the actual development for Tibetan speech synthesis, the paper has taken news text as training corpora, analyzed the speech and prosodic features of Tibetan Lhasa dialect and confirmed the respiratory signal parameters with prosodic features, then adopted RBF neural network to establish Tibetan Lhasa news prosodic model and finally realized the predictions of prosodic control parameters. The main work includes as follows:1. Research the speech features of Tibetan Lhasa dialect combining with the previous results of Chinese prosodic structure; confirm the Tibetan prosodic layers and analyzed Tibetan prosodic structure and features; determine the parameters being able to reflect prosodic features regarded as input parameter-set for prosodic model.2. Collect the Tibetan Daily for the whole year; design and optimize the texts according to Tibetan Lhasa dialect features; make sure that all corpora has covered the Tibetan speech segments and supra-segments; design the prosodic labeling principles suitable with Tibetan features after normalizing and speech recording; establish the prosodic model corpus for Tibetan Lhasa dialect.3. Research the changing features of respiratory signals during breathing according to human physiological mechanism; confirm the corresponding relationship between respiratory signals and prosodic features after data analysis and collect the related parameters used for model training parameters.4. Confirm 6 classes of 39 dimensions context feature parameters in terms of previous prosodic structure analysis results; use RBF neural network to establish prosodic model and output 10 dimensions prosodic control parameters; make use of the labeled corpora in corpus for model training and testing to know the predictable nature of the established model.
