

The Research of Off-Line Handwritten Chinese Character Recognition Based on Large-Set

【作者】 周双飞

【导师】 刘纯平;

【作者基本信息】 苏州大学 , 计算机应用技术, 2011, 硕士

【摘要】 脱机手写汉字识别在中文字符自动化处理和智能输入方面有着广泛的应用前景。由于手写汉字具有随意性、相似字多和字体形式多变等特点,使得脱机手写汉字识别成为字符识别领域的一个难点和热点。本文主要以脱机手写文本图像作为研究对象,研究了文本图像的二值化、汉字字符的分割和基于多特征多分类器融合的识别方法三个方面,寻求一个针对大字符集能较好区分相似特征的脱机手写体汉字识别方案。论文研究内容如下:(1)针对光照不均对文本图像二值化的影响,提出了一种基于边缘轮廓的自适应文档图像二值化方法。该方法基于log边缘轮廓生长的阈值化方法估计文本前景区域,有效的减少笔划丢失和断笔现象,同时解决前景估计时产生大块噪音的问题。其次,以局部区域背景灰度平均值和前景区域平均灰度值与当前位置像素的灰度差值为度量标准,引入抑制噪音的参数变量来改进阈值公式,进一步对噪音进行抑制。实验表明该方法有效地抑制了噪音,较好的保留了汉字结构的完整性。(2)针对手写体汉字中粘连或交叉字符的分割问题,本文提出一种基于最小加权分割路径的脱机手写汉字多步分割方法。该方法继承了以往粗分割和细分割相结合的思想,首先采用投影方法进行粗分割,将文本汉字分为粘连字符和非粘连字符两类;在细分割阶段,抛弃常用的串行模式分割思想,直接利用粗分割后的统计信息,来设置初始分割路径。并基于最短分割路径的思想,在初始分割路径的局部邻域内采用基于最小权值的算法搜索并修改分割路径,从而获得最佳的加权分割路径。实验证明该方法较好地解决了字符分割不足和多处粘连字符的分割问题,有效的提高了分割的准确率,且算法的时间复杂度较低。(3)为进一步提高大字符集汉字的识别率,本文将能反映上下文关系的基于词的级联隐马尔可夫训练模型用于解决相似字识别问题,并给出了一种与其相应的级联识别方法,尝试从识别分类器的角度提高相似字的识别率。然后利用不同分类器的优点,设计了一种结合词级联HMM的多特征多分类器集成方案,该方案使相似字和非相似字能自适应地选择合适的方法进行针对性识别,有效的提高了整体识别率。

【Abstract】 The off-line handwritten Chinese character recognition has wide rang of applications, which is widely used in automated processing and intelligent input of Chinese characters. Handwritten Chinese characters are characterized with arbitrary writing, lots of similar characters and serious irregular variations of shapes. It makes that the research of off-line handwritten Chinese characters recognition is popular and difficult in the region of Chinese characters recognition.The document image is used as the object of research in this paper. We take a preliminary research on the document image binarization, handwritten Chinese characters segmentation and the recognition method based on multi-classifier fusion with multi-feature. The paper will attempt to find an off-line handwritten Chinese character recognition program with high accuracy.Firstly, for the binarization result is the impact of uneven object intensity and the BG algorithm is inefficient, an adaptive document image binarization method based on contour is put forward in this paper. The pixel growth method based on contour with Log’s operator is used to estimate foreground region of text image. It reduces the number of broken strokes and solves the problem of local noise. The new parameter variable is added to the threshold formula to suppress niose further. The experimental results show that the method is effective on suppressing noise and keeping the integrity of Chinese character structure.Secondly, a multi-step segmentation method was put forward in this paper to segment connected or overlapped Chinese character in ancient document. It inherited the fuzzy approach of the rough segmentation and fine segmentation. Firstly, the project profile histogram method was employed to obtain the no touching or overlapping characters from the separated blocks of the characters string. Then, for the touching characters in the wide blocks, the segmentation is performed by searching and modifying the segmentation path in the local neighborhood of initial segmentation path with minimum weight segmentation path algorithm, and the initial segmentation path was obtained according to the statistical data of rough segmentation. Experimental results show that the proposed method can solve the problem of insufficient segmentation characters and multiple touching character segmentation, the proposed method can improve the accuracy of handwritten Chinese character segmentation effectively, and the algorithm has a lower time complexity.Thirdly, in order to improve the large-set recognition rate, the cascaded HMM training algorithm is put forward in this paper to solve the similar character recognition. And cascaded recognition which is used to recognize similar character is also put forward to improve the similar character’s recognition rate. Then, according to the characteristics of different classifiers, a multi-classifier fusion method with multi-feature based on cascaded HMM is put forward to select appropriate recognition algorithm by character feature adaptively. Experimental results show that the proposed method increases effectively recognition rate effectively.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2012年 06期

