节点文献

车载环境下语音识别方法研究

Speech Recognition Investigation in Car Environments

【作者】 马龙华

【导师】 郝燕玲;

【作者基本信息】 哈尔滨工程大学 , 导航、制导与控制, 2008, 博士

【摘要】 半个多世纪以来,语音识别一直是人们研究的重点。语音是人类交流最常用的方式,因此采用语音识别作为人机接口的设备能够给人们的使用带来很大便利。在我国,汽车在最近的十年中越来越多的进入人们的生活,汽车给人们的生活带来的便利是多种多样的,然而现在人们越来越喜欢功能强大的汽车,这就要求车内电子设备的种类越来越多;由此造成的操作也越来越复杂,而人们在开车的时候离开方向盘去操作这些设备是很危险的,因此为车载电子设备配备语音控制人机接口成为一个最佳的选择。由于我国类似的系统还处于空白阶段,因此在这方面进行研究能填补我国在这方面的空白。首先,本文对车内语音识别技术难点之一的端点检测进行了深入了解,并仔细研究了流行的端点检测方法,由于使用环境的噪声导致了流行的端点检测算法在车内环境下检测精度降低。对此本文提出了一种基于自适应坑函数子带熵的端点检测方法,它能够很好的在车内噪声环境下实现语音端点检测。系统在某些情况下会遇到的汽车鸣笛声音对系统识别的干扰问题,本文提出了一种基于频带特征变化解决方法,成功的解决了这个问题。其次,在实际应用环境中会不可避免的遇到车内噪声,本文研究了去除噪声的两种主要方法,也就是谱减法和功率谱减法,以及它们在实际应用中应该注意的问题。本文采用了基于谱减法的噪音去除技术,成功的实现了语音增强。再次,研究了语音识别中常用语音特征参数,主要是线性预测系数和基于美尔频标的倒谱系数。噪声中被语音掩蔽的部分人耳虽然无法听到,但是却会造成语音特征参数的改变,进而造成识别率的下降。如果能够去除这部分就能带来识别率的提升,根据车内噪声的实际情况,本文提出基于听觉心理学的掩蔽效应改进的美尔频标倒谱系数,并且通过试验证明车内噪声的环境下能够对识别率有一定的提高。然后,本文对动态时间规整和隐形马尔可夫等识别方法做了详细的研究,包括动态时间规整的算法及其改进、隐形马尔可夫模型、实现中要解决的问题和基于聚类的隐形马尔可夫模型快速算法。这些工作为最终的试验识别方法、语音特征参数的选择起到了决定性的作用。最后,本文试验部分给出了试验所用的方法、步骤和语音资料库。语音识别分两个试验,一个是基于动态时间规整算法的语音识别试验;另外一个是基于隐形马尔可夫模型的试验,并且提出了一种在满足识别率的要求情况下能够提高计算速度的新方法。通过试验表明隐形马尔可夫较动态时间规整的识别效率要高一些,能够适应词汇量较大的识别系统,并且识别率能够高达98%。因此本文设计的基于隐形马尔可夫模型的车内环境下的语音识别系统能够作为车载电子设备的语音控制人机接口。填补了我国在这方面的空白,为驾驶安全提供了新的途径。

【Abstract】 The study of speech recognition has been under way for well over half a century. Speech recognition offers great convenience in people’s live. In our country, car plays more and more important role in recently ten years. Cars change people’s life greatly in many respects, but people like cars with lots of functions, so cars have more and more electrical devices. More electrical devices means more complex operations, But it is very dangerous for drivers to leave steering wheel to operate the electrical devices. Car electrical devices with speech control Human Machine Interface may be the best solution for this problem. Because in our country similar of speech recoginition is still placed in a blank stage, therefore our conducts the research in this aspect will be enable our contury filll blank in this area.Firstly we analysis the technical diffcults of speech recognition in car noise envirments, and give solution to these problems. Speech endpoint detection in car noise envirments is more diffcult than in pure speech. We investigate several popluar endpoint detection technologies, and find the weakness of these technologies. A new method named single-well function based adaptive subband entropy is chose to solve the problem, and it works well than other methods in car noise environment. In our special noise background, car horn is very similar to speech at spectrogram view, so speech is confused by cars’ horn. A new method based on frequency subband variety is adopted to compensate it, and it works well.Secondly, speech contaminated by car noise has low recognition rate. To overcome this drawback, we study two popluar noise cancellations technology, which are spectral subtraction and power spectral subtraction. To use it in practice we study the detail of the pratical tips. Our system adopts spectral subtraction technology to achieve speech enhancement.Thirdly, we study speech features used in speech recognition. Speech feature used in speech recognition mainly are Linear Predictive Coefficients and Mel-Frequency Cesptral Coefficients. Because of the noisy environment, the noise masked by speech may not be heard, but it still influences the ratio of speech recognition. So we must get rid of it. In this paper, we utilize psychoacoustics to modify Mel-Frequency Cesptral Coefficients, and experiments show it can improve recognition ratio.After that, we study Dynamic Time Warpped and Hidden Markov Chain carefully. Also we investigate how to modify it to achieve a good performance, we adopt clustering of Hidden Markov Model and it gives the foundation of classifier, setup of experiments and speech feature.Lastly, we give out our experiments setup, speech database. Our experiments use Dynamic Time Warpped and Hidden Markov Chain as classifier. In Hidden Markov Chain experiments we use a new method to speed up the calculation without decrease the speech recognition ratio. Experiments prove that Hidden Markov Chain classifier adapts large vocabulary system, and has good speech recognition ratio than Dynamic Time Warpped classifier. Experiments results shows the speech recognition ratio can achieve 98%, so it can use as car electrical devices’ Human Machine Interface, and it gives out a new method to achieve car safe and fills the blank in this area.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络