节点文献

基于表观建模的中国手语识别技术研究

【作者】 杨全

【导师】 彭进业;

【作者基本信息】 西北大学 , 计算机软件与理论, 2013, 博士

【摘要】 研究中国手语的识别技术,就是为了把聋人使用的手语通过计算机自动、高效地完成机器翻译,从而实现手语与自然语音的无障碍交流。这种方式便于聋人融入社会,有利于他们与周围环境的交流,为其提供更好的服务。同时,手语识别在其他应用领域,也有着深远的研究意义。基于计算机视觉的手语识别作为一种自然、直观的交互方式,无需附加的物理设备作为输入,在人机交互过程研究中占很大比重,能够广泛地应用于多学科领域。作为一个非常有意义的研究课题,手语识别的研究,不仅有助于改善、提高聋人的生活、学习和工作环境,也能够提高计算机对人类自然语言的理解水平,发展成为一种能够付诸应用的最自然的人机交互方式。本文从自然交互方式的角度出发,研究了基于机器视觉的手语手势跟踪、手势分割提取、手语表观建模、SVM核函数构建算法及中国手指语字母的识别。具体研究工作包括以下几个方面:(1)根据Kinect同步拍摄深度视频信息的特点,使用手语视频中的深度图像信息对CamShift加以改进,提出了一种在复杂场景下跟踪能力更强、抗干扰性能更好的,适于手语识别应用的DI_CamShift (Depth Image CamShift)算法。采用基于深度图像信息的跟踪算法在手语视频中确定手势区域,对手语手势进行定位跟踪。在手势提取方面,通过计算手势深度图像确定手势的主轴方向,提出了一种基于深度图像信息的椭圆边界肤色建模方法。(2)在复杂背景下的手势提取过程中,结合椭圆边界肤色模型,得出新的基于深度积分图像的二维OTSU算法,并将积分图和粒子群结合用于二维OTSU算法,提出了基于深度积分图和粒子群优化的OTSU算法提取手势图像。(3)在手语表观建模时,为了使其各种视觉特征均可以被描述的更为准确,同时提取了SURF特征,Gabor小波纹理特征和颜色直方图特征,作为手势表观特征的完备特征集。在解决手势图像局部特征点个数不同的问题时,本文提出使用BoW (Bag of Words)方法,把提取出的手语完备特征集量化生成手语视觉单词(Sign Language Visual Word)。对提取到的手势特征,采用K-Means聚类算法,生成手语SURF词包、Gabor词包和颜色直方图词包,最后把所有词包经过CCA融合后的结果作为手语的特征。(4)研究了SVM及核函数理论,通过提取相同的手势特征对单核SVM分类器进行训练,比对研究不同核函数SVM在手语识别中的效果。提出构建一种新的适于手语识别的核函数H_Kernel,并证明了H_Kernel满足Mercer条件,可以作为用于手语识别的SVM核函数。鉴于手语BoW模型并未考虑语义信息,所以本文提出构建基于H Kernel和BoW语义核的混合核函数SVM进行训练学习及手语识别。

【Abstract】 Sign language recognition is the method which uses the computer to translate the sign language into text or voice. It can realize the communication between the natural language and the sign language. It provides good service for the deaf, to help the deaf communicate to normal people and enables them feel comfortable in the society. At the same time, the sign language recognition can also be used in other application fields, and has a significant importance. Sign language recognition, which is based on computer vision, is a natural and intuitive interaction. Without additional physical device, it has a great influence on the human-computer interaction research and can be widely used in many fields. As a very meaningful research subject, sign language recognition research not only help to improve the living, learning and working environment for the deaf, but also helpful to the computer’s understanding level of human natural language and develops in the most natural way of human-computer interaction.This paper studies sign language gesture tracking, gesture segmentation, gesture modeling based on appearance, constructing method of SVM kernel function, and Chinese finger alphabet recognition. Specific research work includes the following aspects:(1) According to the characteristics of Kinect depth information video, the CamShift has been improved, and a new tracking algorithm is developed, which is the DI_CamShift (the Depth Image CamShift) algorithm. Under the complex scene, its tracking ability is stronger, more anti-jamming and suitable for sign language recognition. After the DI_CamShift algorithm tracking in the sign language video, gesture is located. The principal axis direction is determined by calculation of depth image, and a new modeling method based on depth image information is developed.(2) Under the complex background, gestures extraction combines with the elliptic boundary model, and produces a new2D OTSU algorithm based on integral depth image. Then both of the integral figure and the particle swarm is used in2D OTSU algorithm to develop an optimization algorithm.(3) While the sign language apparent modeling, complete feature set of the gesture is constructed in order to describe the various visual features accurately, which includes the SIFT feature, Gabor characteristics, SURF feature and color histogram. To solve the problem of feature points number, this paper uses the BoW (Bag of Words) method. The sign language feature set is used to generate sign language visual word. All of the gesture features are clustered by K-Means algorithm, and generated the SURF BoW, Gabor BoW and color histogram BoW. All of the BoWs are fusioned together by CCA.(4) While study of SVM theory and kernel function, this paper trains the SVM classifier by extracting the same gestures feature. A new kernel named H_Kernel, is developed and be proved as satisfy to Mercer condition after compared the effect of SVM with different kernel functions in sign language recognition. It’s also suitable for sign language recognition. As BoW model does not contain the sign language semantic information, a method which based on H_Kernel and BoW semantic kernel function is proposed for SVM learning and sign language recognition.

  • 【网络出版投稿人】 西北大学
  • 【网络出版年期】2014年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络