节点文献

场景文本识别关键技术研究

Study on Key Technologies of Scene Text Recognition

【作者】 尹芳

【导师】 陈德运;

【作者基本信息】 哈尔滨理工大学 , 计算机应用技术, 2012, 博士

【摘要】 场景图像中包含着丰富的文本信息,它们可以从很大程度上帮助人们去捕获和认知场景图像的内容及含义,因此场景图像中的文本对其所在图像的视觉信息获取具有极其重要的作用。如果使用计算机自动识别场景图像中包含的文本内容,并应用于盲人辅助导航、无人驾驶导航、安全保卫、危机预防及处理等领域,将给人们的工作生活带来极大便利。场景文本识别技术与传统的光学字符识别技术(Optical Character Recognition,OCR)有着显著差别,主要在于场景文本图像与传统扫描文档的不同。场景文本图像主要通过数码相机、摄像机等设备获得,图像中存在颜色不一致、亮度不均匀、背景复杂多变、噪声强烈等现象,文本可能发生变形、字迹模糊、残缺、笔划断裂等问题,这些干扰因素使得场景文本识别存在很大困难,面临诸多挑战。针对这些问题,本文拟对场景文本识别的几个关键技术展开研究,包括复杂背景下的文本提取技术;自然场景下的文本变形校正技术以及场景文本单字符识别技术。针对场景文本背景图像构成复杂、影响文本识别效果的问题,通过分析场景文本图像的特点,在识别前进行预处理,将文本图像从复杂背景中提取出来,在此基础上提出了一种基于归一化割的谱聚类文本提取方法。首先根.据文本图像特点建立相似性权值函数,然后根据场景文本颜色分布特性按照颜色直方图对色彩空间进行量化,得到数量有限、不同颜色的像素集合,并以量化的颜色等级为单位结合像素的纹理特征及分布特点来构造相似矩阵,最后在归一化割准则下利用谱聚类方法实现图像分割。该方法将经过量化的颜色集合作为图分割中的顶点以简化加权图模型,从而显著降低谱聚类的计算复杂性,提高了谱聚类方法在图像分割方面的应用能力。在ICDAR2009、2003竞赛测试集、以及大量其他文本图像上的实验表明,本文方法具有良好的文本提取性能。针对场景文本由于文本载体本身倾斜或获取过程中相机视角倾斜引起的倾斜变形和透视变形问题,提出了一种基于数学形态学的变形校正方法。使用形态学方法针对不同变形情况选取不同形态学因子提取特征点;然后通过聚类方法和最近邻方法根据特征点的聚类信息拟合文本基线,并使用随机采样一致性算法计算基线位置,获得变形参数;最后,通过投影变换完成文本图像的变形校正。实验结果表明,本文提出的方法能够对存在一定程度变形的场景文本进行校正,以提高文本识别系统的识别准确率,特别是对行数较少的场景文本的处理,与其他方法相比具有明显优势。针对场景文本字迹模糊、笔划断裂、噪声强烈等问题,本文提出了一种鲁棒性强的提取Gabor小波特征的改进方法。该方法首先在基本Gabor小波变换基础上进行滤波方向的选择分类,然后利用带有方向选择性的小波变换提取Gabor特征,并与直方图相结合得到用于识别的组合特征。通过一系列的对比实验,显示出利用本文方法提出的组合特征针对笔划模糊这样的低质量字符图像具有良好的分类能力。为寻求高性能的场景文本识别系统,本文提出了一种基于背景相关分析的文本识别方法。该方法针对场景中文本与其背景之间的相互联系,利用典型相关分析方法挖掘背景与文本之间的相关性,提取字符图像与背景图像之间的典型相关特征用作字符分类特征,在场景文本样本集上的测试取得令人满意的结果,实验数据显示使用典型相关特征显著提高了场景文本的识别性能,表明了该分类特征的有效性。该方法突破了传统识别方法仅考虑文本自身特征的局限性,充分利用了图像中文本的周边信息,对场景文本识别方法研究是一个新的突破。实验结果同时表明利用字符以外的背景信息辅助识别是一个值得继续研究的课题,它为实现高性能的场景文本识别系统提供了全新的研究思路。

【Abstract】 Images in natural scene always contain rich text information, and they can help people to capture and understand the content and meaning of natural scene image to a large extend. So text in natural scene plays an important role in the image visual information acquisition. If humans can use computer to recognize the text content in natural scene image automatically, and apply it to auxiliary blind navigation, unmanned navigation, security, crisis prevention and treatment and other fields, our life will become more convenient.Scene text recognition and the traditional optical character recognition (OCR) have essential difference, which mainly lies in scene text is mainly obtained by digital camera or video camera, so the image has color not consistent, brightness uneven, background complicated and other strong noise, so text in the image may be deformed, low resolution, strokes fracture and other issues. These factors bring scene text recognition a lot of difficulties, and there are many problems facing challenges. In this paper, text in natural scene recognition system is studied, and the research on the key technical issues is carried on.According to the problem that the background of scene text is complex and it will affect the text recognition, the characteristic of scene text image and research situation of text segmentation are analyzed, and improved text image segmentation method based on spectral clustering is brought out on the basis of decision to do image preprocessing as the first step to separate the text from the complex background. Firstly the similarity function is established considering the characteristic of the text image. According to the color distribution of scene images the color space is quantized to get limited number pixel sets of each kind of color using color histogram, and the affinity matrix is constructed under the quantized levels. Finally, the method uses the spectral clustering under Ncut criterion to segment images. The method uses color sets quantized as vertex of graph to simplify the weighted graph model so the computational complexity of spectral clustering is reduced significantly and the application ability of the spectral clustering method in image segmentation is improved. Experiments on the test images of ICDAR2003,2009competition and plenty of other text images have been done, and the results show that the proposed method is with good performance on text segmentation.An effective perspective distortion correction method is presented to resolve the perspective distortion correction in scene text recognition caused by the tilt of text carrier itself or camera view. In this paper mathematic morphology is employed to select morphological factors for various distortion; then the clustering information is extracted by using clustering method and nearest neighbor method to fit text base-line and followed by some statistic calculation such as RANSAC (Random Sample Consensus) to locate the base-line so as to extract the distortion parameters. At last affine transformation is applied to finish the distortion correction for text images. Experiments show the method in this paper is effective to correct the text image distortion, and improve the text recognition rate significantly. Especially for scene text with a few lines, this method has advantages.According to the problems of scene text recognition such as lower resolution, poor quality, serious noise and others, this paper uses Gabor wavelet transform feature with high robustness as classification feature. And further the original Gabor wavelet transform is improved by pre-classification of direction and feature fusion combined with histogram. Series of comparative experiments prove that the proposed features have good classification ability for low quality character image with fuzzy strokes.A text recognition method based on CCA of background is proposed in order to seek a high-performance scene text recognition system. According to the correlationship between scene text and background the method extracts the CCA feature as the classification feature for character to mine correlation between background text using CCA. The method obtained a satisfied result and the experimental data show CCA feature may significantly improves the performance of scene text recognition and the method is effective. This method breaks through the limitations of traditional recognition ways which onlv consider the characteristic of text itself, and makes full use of the surrounding information of the text in image. That is a new breakthrough of scene text recognition. The experimental results also show that using background information to assist recognition is a worthy subject for further research. It provides a new research idea to achieve high-performance scene text recognition system.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络