节点文献

基于FPGA的语音识别技术研究

The Research of Speech Recognition Technology Based on FPGA

【作者】 谢秋云

【导师】 肖铁军;

【作者基本信息】 江苏大学 , 计算机应用技术, 2007, 硕士

【摘要】 许多已有的语音识别系统都是基于计算机软件的。但现在很多应用却要求体积压缩,方便携带和低功耗。所以基于集成电路的语音识别专用芯片有广阔的发展空间。当前语音芯片都采用以DSP为核心的结构,费用高,设计缺乏灵活性,很难进一步提高处理性能。FPGA(Field-Programmable Gate Array,现场可编程门阵列)具有功耗低、体积小、集成度高、速度快、开发周期短、费用低、用户可定义功能及可重复编程和擦写等许多优点,可以实现高性能并行算法。本文主要研究的就是用FPGA来实现语音识别算法。主要工作包括:研究并实践了数字处理算法的多种FPGA设计方法——VLSI结构的设计方法;硬件DSP的Matlab建模设计方法:IP核设计方法等。运用这些方法,设计实现了一些基础运算功能的硬件实现,并用于语音识别算法。语音识别的前端处理及硬件实现。包括预加重,分帧,加窗和端点检测。采用了基于能量变迁的语音的端点检测方法。并在该方法上改进,采用实时分帧,不但能够实现实时的端点检测,还具有一定的抗噪性。语音特征提取及其硬件设计。采用Mel频标倒谱参数(Mel FrequencyCepstrum Coefficient,MFCC),充分模拟了人的听觉特性,具有较高的识别性能和抗噪能力。该参数计算主要包括快速傅立叶变换(FFT)、三角滤波、取对数和离散余弦变换(DCT)等过程。本文在每个过程的硬件结构上都进行了巧妙的设计,提高了速度和效率:FFT中针对实数的FFT做了硬件结构的改进减少了FFT点数,使速度提高了约40%;三角滤波器将其中心频率转化为频谱中对应点,提高了运算效率;取对数中用了查表和线性插值结合的方法,提高了精度。最后提出了三级流水计算MFCC参数的硬件结构,进一步加快了MFCC参数计算。矢量量化硬件设计中采用与最小值比较的方法来提高码本的搜索速度。Viterbi识别算法及其硬件实现。采用隐马尔可夫(HMM)来进行声音建模和匹配。HMM在计算量和存储量上被认为是最有效的方法。在Viterbi识别中,对传统的Viterbi算法公式做了改进,进行了剪枝,使搜索速率大大提高,采用了4个ACS并行处理,简化了电路,提高了识别速度。

【Abstract】 Many speech recognition systems are based on software, but more and more applications now require physical compactness, portability, in addition to low-power. Therefore, the dedicated speech recognition chip based on integrate circuit has an extensive development space. Current speech chips based on DSP cost too expensive, and lack of flexility in design, so the performance can’t be improved more. FPGA(Field-Programmable Gate Array) has a lot of advantages such as low power consumption, small size, hign integration and speed, short development cycle, low-cost, User-definable function, programming and erasing repeatedly, so it has good performance in Parallel arithmetic.This paper studies how to realize algorithms of speech recognition with FPGA. The main task is as follows:A variety of FPGA design methods of digital processing algorithm are studied and realized, such as VLSI architecture design method; Matlab modeling of DSP hareware design method; IP core design method. Some basic computing function units based on hardware are implemented with thses moehods, and used for speech recognition algorithm.The front-end processing of speech recognition, including pre-emphasis, enframing, windowing and endpoint detecting. A method based on energy changing is proposed and improved by real-time enframing so it can perform well in real-time endpoint detecting as well as some antinoise capability.The feature extraction of speech recognition and its hardware design. The Mel Frequency Cepstrum Coefficient (MFCC) fully simulates the characteristics of the hearing, so it has high performance and antinoise capability in recognition. However, its computation is very complex including Fast Fourier Transform(FFT), triangular filter, logarithm and Discrete Cosine Transform(DCT). In this paper, the hardware design of each process has improved its speed. In FFT, by reducing FFT points of real number, the speed is improved by 40%. In triangular filter computation, the center frequency is converted into the corresponding point in frequency spectrum to get high calculating efficiency. In logarithm, the look-up table and linear interpolation are used to improve the precision. Finally, afrer analysis of the MFCC process, a three pipeline processing hardware structure is presented. It can perform triangular filter, logarithm and DCT almost parallelly, which accelerates the MFCC extraction speed. In Vector Quantization(VQ), the efficiency of codebook search is improved by compareing result with minimum.Viterbi recognition arithmetic and its hardware implemetation. The Hidden Markov Model(HMM) is used for modeling an matching, and it could be considered the most powerful technique in terms of computation and storage requirements. A method according to the HMM structure, which improved the formula of traditional Viterbi algorithm, can achieve high searching speed by pruning. Four ACS units are used for parallel processing, which simplify the circuit and improve the recognition speed.

【关键词】 语音识别FPGA隐马尔可夫模型MFCCViterbi
【Key words】 speech recognitionFPGAHMMMFCCViterbi
  • 【网络出版投稿人】 江苏大学
  • 【网络出版年期】2008年 09期
  • 【分类号】TN912.34
  • 【被引频次】9
  • 【下载频次】778
节点文献中: 

本文链接的文献网络图示:

本文的引文网络