节点文献

基于视频的唇部定位和序列切分算法的研究

Research on the Lip Location and Image Sequence Segmentation Algorithm Based on Video

【作者】 姚文娟

【导师】 杜明辉;

【作者基本信息】 华南理工大学 , 通信与信息系统, 2011, 硕士

【摘要】 唇读(Lipreading/speechreading),即是通过观察说话者的口型变化,“读出”所说的内容。唇读是人工智能,图像处理,模式识别等相关研究领域综合发展所产生的一个新的研究方向,它被广泛应用于语音识别的辅助手段,同时在安防系统的身份认证,辅助手语识别,听觉障碍人士的语言学习,基于唇动特点的生物特征识别等领域也有广阔的应用前景。一个完整的唇读系统通常包括人脸检测,唇部检测定位,图像序列的切分(端点检测),特征提取和唇语识别。其中,准确地将嘴唇实时检测和定位,是一切唇读系统的首要任务,它直接影响到后续的唇读工作。而对于一个视频,每个孤立字的图像序列的切分,则是唇读系统的又一个重要步骤,直接影响到唇读识别率。目前,用于唇读识别的孤立字切分都是基于音频的(基于听觉特征的),必然存在音节切分不完整的缺点,本文利用视觉和听觉融合的序列切分算法,提高了唇读识别率。本文的主要研究内容包括以下方面:(1)考虑到唇读视频数据库所占存储容量大,不利于共享和传播,以及鉴于本文的研究内容,本文自建了双模态数据库,并在此基础上进行后续的处理。(2)本文在利用OpenCV人脸检测模块检测出人脸之后,通过大量的实验,提出了利用人脸的结构特征和灰度信息进行唇部检测定位的方法,并完成了对唇部图像的归一化。该方法对头部运动和镜头的缩放具有较好的鲁棒性。(3)目前用于唇读识别的孤立字切分一般都是基于音频(基于听觉特征)的,比较经典的方法是基于短时能量的端点检测方法。本文以此为基础,在视觉通道上,利用图像比较的方法,提出了改进的切分算法,达到了视觉和听觉的融合。实验结果显示,本文方法能对孤立字进行更完整的切分,并且相对于基于听觉特征的切分,提高了唇读的识别率。

【Abstract】 Lip-reading is“read out”the contents of speaker said by observing his lip movements. As a result of the joint development in artificial intelligence, image processing, pattern recognition and the relative researches, Lip-reading is a new research direction. It has been researched as complement to improve the speech recognition, and also been used for speaker identification in security system, for sign language recognition,for the language learning of hearing hard people, and for the biometric recognition.A complete lip reading system can consist of face detection, lip location, image sequence segmentation, lip movement extraction, and lip reading. Lip location is one of the most important steps of the lip-reading system, and its accuracy will affect the whole lip-reading system. For a video, another most important step is image sequence segmentation of every word, which can affect the recognition rate of lip-reading system. Up to now, all the image sequence segmentation methods are based on the Audio information, which can lead to incomplete segmentation. In this paper, we propose a segmentation method by combining audio information and video information to improve the recognition rate of lip-reading. The main work reads as follow:(1) Consider the difficulty in sharing and spreading of big video database for lip-reading, and the research contents of this paper, we setup a small video database for lip-reading.(2) Based on the face detection using OpenCV, we analyze the structure of a lot of people’s faces, and propose a lip location method using face structure and gray information. And, normalization of the lip image is also been completed. This method has invariable reference, so it can reflect the real size and shape of the lips, it is robust for the zoom and the movements of the face.(3) Up to now, all the image sequence segmentation methods are based on the Audio information, one of the most classic methods is audio segmentation method base on short-time energy. Based on this audio segment method, a new effective method is proposed to improve the recognition rate by using video information combined with audio information. Experimental results show that our method can make the segmentation complete and improve the recognition rate.

【关键词】 唇读唇部定位序列切分
【Key words】 lip-readinglip locationimage sequence segmentation
节点文献中: 

本文链接的文献网络图示:

本文的引文网络