节点文献

小词汇非特定人的孤立词语音识别系统的研究与设计

Research and Design of Small Vocabulary and Speaker-independent Isolated Words Speech Recognition System

【作者】 汪冰

【导师】 汤荣江;

【作者基本信息】 广东工业大学 , 计算机应用技术, 2008, 硕士

【摘要】 语音识别技术是语音信号处理中的一个分支,语音识别技术就是让机器通过识别和理解过程把语音信号转变为相应的文本或命令的技术。语音识别是一门交叉学科,涉及到人工智能、模式识别、数字信号处理、计算机科学、语言声学、心理学、生理学和认知科学等许多学科领域,具有深远的研究价值。语音识别和语音合成技术已经成为现代技术发展的一个标志,也是现代计算机技术研究和发展的一个重要领域。虽然语音识别技术已经取得了一些成就,也有部分产品面世,但是,大多数语音识别系统仍局限于实验室,远没有达到实用化要求。目前语音识别技术研究的热点是如何实现在线无监督的学习和多方法综合自适应学习算法;制约实用化的根本原因可以归为两类,识别精度和系统复杂度。语音识别按照任务的不同可以分为四个方面:说话人识别、关键词检出、语言辨识和连续语音识别。本文主要对小词汇非特定人的孤立词语音识别算法进行研究。语音识别的主要流程包括:语音信号的预处理、端点检测、特征提取、建立语音模板库、模式匹配。本文首先探讨了语音识别的基本原理和各种语音识别算法的特点,比较并选取了有效的非特定人孤立词语音识别算法,对其实现进行了深入分析,最后利用VC进行了开发。采用动态时间归整模型形成的经典语音识别算法常用在非特定人小词汇量语音识别系统中,本文提出了具有一定鲁棒性的端点检测语音识别技术,对传统的基于过零率与短时能量的双门限端点检测方法进行了改进,提出了根据语音文件数据自动调节门限的可变门限端点检测方法,并对该算法在Matlab进行仿真测试,试验表明该算法对语音端点检测的准确度有一定的改善,然后本文使用VC对该算法的进行了编程实现。在语音信号采集时,通过调用底层API,在一定程度上减小了噪声对语音数据的影响。论文对语音波形的特征提取线性预测倒谱系数(LPCC),利用动态时间规整技术(DTW)对模板进行匹配和聚类的方法建立模板库。最后,论文对算法的实验结果进行了测试分析。

【Abstract】 Speech Recognition is a branch of voice signal processing technology, through the process of identifying and understanding the voice signal speech recognition make its into the appropriate text or order. Speech Recognition is an interdisciplinary, involving artificial intelligence, pattern recognition, digital signal processing, computer science, language acoustics, psychology, physiology and cognitive science, and many other fields. As an interdisciplinary field, speechr ecognitionist heoretically very valued. Speech recognition has become one of the important research fields and a mark of the development of science. Although speech technology has got some achievements, most speech recognition systems are still limited in lab and would have problems if migrated from lab which are much far from practicality. How to achieve online unsupervised learning methods and more integrated adaptive learning algorithm is the hotspots of current research of speech recognition technology. The ultimate reasons for restricting practicality can be classified to two kinds, one is precision for recognition and the other is complexity of the system.Speech Recognition in accordance with the different tasks can be divided into four areas: speaker recognition, keyword detection, language identification and continuous speech recognition. This paper mainly focuscs on speaker independent isolated word speech recognition algorithm.Fundamentals of speech recognition and its algorithm have been studied in this paper. We compare the difference of the speaker independent isolated word speech recognition algorithm, and select some effective approaches for our system. Then we research on how to realize our speaker independent isolated word recognition algorithm on person computer. The algorithm was realized by Visual C++ in computer finally. The DTW (Dynamic Time Warping) model, which is typically algorithm, is recognition often used in independent small vocabularies speech systems. In this paper, an innovative endpoint detection technology for robust speech recognition is presented, according to the voice data information, this technology, based on the traditional zero-rate and short-term energy endpoint detection methods which is called dual-threshold endpoint detection methods, automatically adjust the threshold variable threshold. We simulate and test the algorithms by Matlab. Tests show that this endpoint detection algorithm improves the certain of accuracy. Then the algorithm was programmed by the VC. In the voice signal acquisition, by calling the API of Windows, to a certain extent, reduce the noise on the voice data. In this speech recognition system, the feature extraction algorithm is linear prediction analysis (LPCC), the pattern matching algorithm is dynamic time warping(DTW) and the construction process of speech corpus by clustering. Finally, the algorithm test results were analyzed in this paper.

  • 【分类号】TP391.42
  • 【被引频次】3
  • 【下载频次】371
节点文献中: 

本文链接的文献网络图示:

本文的引文网络