节点文献

基于音节的汉语连续语音声调识别方法研究

SYLLABLE-BASED METHOD OF TONE RECOGNITION FOR CHINESE CONTINUOUS SPEECH

【作者】 钟金宏

【导师】 杨善林;

【作者基本信息】 合肥工业大学 , 计算机应用技术, 2001, 博士

【摘要】 声调是汉语的主要属性之一,具有构词、辨义和提高表达效果等功能,对语音识别、语音合成和自然语言理解有重要意义。 近年来,自动语音识别研究取得了突破性的进展,出现了许多不同类型的语音识别系统。语音识别研究也转向了大词汇非认人连续语音识别和自然语言理解。现有的汉语语音识别系统基本上没有利用声调信息,声调识别研究也多限于孤立字和多字词的声调识别,连续语音的声调模式和声调识别研究很少,本文在这方面开展了一点工作。 汉语连续语音的声调识别比孤立字和多字词的声调识别更困难,本文提出了基于音节的声调识别思想,研究了其中涉及的音节分割、声调获取、特征提取、声调模式分析和声调识别模型等问题。论文的主要内容如下: (1)利用分形理论和波形互相关性研究了汉语连续语音中的音节分割问题。本文选音节做声调识别基元,这将引入音节切分问题。连续语流中的音节分割是非常困难的。本文根据语音信号的混沌本质,利用分形理论研究了汉语连续语音中的音节分割问题,提出了基于方差分形维数的音节分割方法,并详细分析了该方法的性能,它能很好地解决了无声与有声、浊音与清音间的分割问题,但很难解决浊音间的分割问题,当浊音相连且过渡段较短时,该方法无法实现它们之间的分割。为解决浊音之间的分割问题,本文根据语音中过渡段与非过渡段语音波形的差异,利用波形互相关性进行了研究,提出了基于波形互相关性的音节分割方法,并进行了实例分析。 (2)基于小波变换的语音基频提取。声调是基频变化的模式,因此可通过基频提取来获取声调信息。基频提取的方法很多,本文采用了小波变换方法,该方法对部分语音得到了较好的结果,但对大部分语音提取的基频中含有较多错误,经仔细分析和研究,本文对它进行了改进,提出了一种基于小波变换的语音基频检测新算法。该算法根据基频点在小波变换的不同分辨率层具有传递性和在不同尺度上的基频点位置相似的特性,采用投票策略选择基频点。该算法主要有以下几步:计算在五个(或三个)尺度上的小波变换;运用投票机制进行基频点选择;基频检查;基频点的重新定位。 (3)声调识别中的特征提取问题。特征提取是模式识别的基本问题。有效的特征既能反映模式的重要信息,又可减少计算量和误识率。汉语声调主要由基音曲线的调位高低和走向决定。因此,本文选择头尾差和相对调位比作为三字词音节的声调特征;选择头尾差和音节起点调位作为连续语音中音节的声调特征。合肥工业大学搏土论文 扬耍 (4)声调模式分析。连续语音中各音节的声调特征受前后音节的影响变化较大,声调模式更加复杂,仅具有四声的基本特征。正确地分析其中的声调模式和变调规则,对汉语连续语音的声调识别有重要意义。本文介绍了孤立字和二字词的声调模式,定性和定量地分析了三字词的声调模式,在此基础上研究了连续语音的声调模式。 (5)声调识别模型的选择与设计。汉语连续语音的声调模式复杂多变,一个固定不变的识别模型不可能解决连续语音的声调识别问题。本文以具有在线学习能力的模糊神经网络作为声调识别模型,提出了基于模糊自适应谐振理论映射的声调识别方法。 (6)用三字词和连续语音实例验证了所提出的思想和方法。 论文中取得的研究成果如下: 门)根据汉语的特点,提出了基于音节的汉语连续语音声调识别思想。 (2)根据语音信号的混饨本质,提出了基于方差分形维数的音节分割方法;针对 浊音间的分割困难,提出了基于波形互相关性的音节分割方法。 (3)根据传统小波变换方法在基频检测实验中出现的问题,引入投票策略,提出 了一种基于小波变换的基频检测新算法。 (4)根据汉语声调曲线的特点,选择头尾差和相对调位比作为三字词各音节的声 调特征:选择头尾差和音节起点调位作为连续语音中音节的声调特征。 (5)定性和定量地分析了三字词的声调模式,印证了已有的三字词声调模式变化 规律,得到了一些新的三字词声调模式变调规则。对汉语连续语音的声调模 式,提出了以下观点:连续语音中的音节声调模式可以二字词和三字词的声 调模式为基础:连续语音中的音节可认为仅受前后音节的彤响,一定间隙前 后的音节声调可看成互不相关;连续语音中的音节声调模式可归为头、中和 尾三类,通过对这三类声调模式的建模,可解决连续语音的声调识别问题。 ①)为了适应连续语音中的复杂情况,提出了以具有在线学习能力的模糊神经网 络作为声调识别模型的观点。在此基础上提出了基于模糊自适应谐振理论映 射的声调识别方法。

【Abstract】 Tone is one of primary properties for Chinese. Its functions are listed as the following: constructing words, distinguishing semantic and improving expression effect. It is important to speech recognition, speech synthesis and natural language understanding.In the recent years large progress has been made in speech auto-recognition; many voice speech systems have been developed. Now research on voice recognition turns to large vocabulary speaker-independent continuous speech recognition and natural language understanding. Tone information isn抰 basically used in current Chinese speech recognition system, study on tone recognition is limited to tone recognition of isolated-word and multi-syllabic word, and research on tone patterns and tone recognition for Chinese continuous speech is little.In this dissertation a syllable-based method of tone recognition for continuous speech is proposed. This method includes the following procedures: syllable segmentation, pitch detection, feature extraction, tone pattern analysis, and tone recognition. Our research is show as follows:(1) Syllable segmentation in continuous speech. Syllable is determined as tone recognition unit in this thesis, so syllable segmentation must be done. Syllable segmentation in continuous speech is very difficult. In accordance with chaotic essence of speech signals, syllable segmentation in continuous speech is researched by fractal theory. An approach of syllable segmentation using variance fractal dimension is proposed, its performance is analyzed in detail. The method can discriminate between voiced and unvoiced, between surd and sonant, but it can hardly discriminate between sonant. According to difference of speech waveform between transition segment and non-transition segment, dividing between sonant is researched using waveform cross correlation. A method of syllable segmentation is presented based on waveform cross-correlation.(2) Pitch detection of speech signals. Mandarin tone is the patterns of pitch variation, so it may be acquired by pitch extraction. Many methods of pitch detection are developed so far, in this thesis the pitch detector using waveform transform is adopted. According to the problem appearing in pitch extraction experiment, a novel algorithm of pitch detection is presented. The pitch points in speech signal exhibit local maximum across several consecutive dyadic scales, and their positions are similar, so the improved approach selects pitch points by vote strategy, not by traditional method. Procedure of the new algorithm is as follows: (i) calculating the wavelet transformIII2Abstractacross 5 (or 3) consecutive scales; (ii) choosing pitch points by vote strategy; (iii) checking pitch points; (IV) relocation of pitch points.(3)Feature extraction in tone recognition. Feature extraction is a basic problem in pattern recognition. Valid feature can reflect important information of pattern, and decrease computation and error recognition rate. Mandarin tone is characterized for tone level and tendency of pitch curve, so head-tail difference and relative tone level rate are determined as tone features for Chinese tn-syllable word; head-tail difference and tone level at the beginning of syllable are determined as tone features of syllable in continuous speech.(4)Tone pattern analysis. Tone characteristics of syllable in continuous speech have lager variation than original tone characteristic under the influence of its preceding and posterior syllable, so tone patterns is more complicated, and have only basic characteristic of Chinese tone. Tone patterns and its variation rules are important to tone recognition for continuous speech. In this dissertation tone patterns of disyllabic word and isolated-word are introduced, tone patterns for tn-syllabic word are analyzed in detail. Tone patterns for continuous speech are researched based on the foregoing work.(5)Selection and design of tone recognition module. Tone patterns of syllable in continuous speech

节点文献中: