

Research on Off-line Handwritten Chinese Character Recognition System

【作者】 孙泉

【导师】 成瑜;

【作者基本信息】 南京航空航天大学 , 通信与信息系统, 2007, 硕士

【摘要】 脱机手写体汉字识别是模式识别领域一个极具挑战性的课题,它将在信函分拣、银行支票识别、统计报表处理以及手写文稿的自动输入等诸多方面发挥巨大的作用。然而,手写体汉字的书写随意性很大,相邻汉字之间的位置关系也复杂多样,因此,相对于其他字符识别,脱机手写体汉字的发展明显缓慢而障碍重重。本系统的主要应用方向为手写文稿的自动录入,主要工作如下:1、预处理方面,实现了基本的图像平滑,并针对不同纸张背景制定了区别对待的图像二值化策略:对以空白纸张为背景的汉字图像采用迭代最佳分割阈值算法,以稿纸为背景的汉字图像采用双重阈值法。2、回顾和总结了历年手写汉字的主要细化方法,在结合本系统主要适用于汉字录入这一用途的基础上,提出了改进细化算法。3、介绍了几种主要的统计特征和笔划结构特征提取方法,针对手写体汉字采用全新的笔段特征提取算法,同时还提出了一种新的基于笔画结构的字切分算法。4、在识别阶段,本文采用了改进的双层串行分类器结构,使识别时间比单层分类器缩短了50%。本系统中训练和测试样本共包含一级汉字和二级汉字约2000个,每个汉字有6种不同风格。将训练样本分为两类:第一类为手写印刷体汉字,笔划疏散且基本横平竖直;第二类工整普通汉字书写有少量连笔,字形尽量规整。分别采用两种不同识别方法后得到第一类汉字识别正确率为90%,第二类汉字为85%。

【Abstract】 Off-line handwritten Chinese character recognition is a challenge in the field of pattern recognition. It will take an important part in many fields of our life, such as letter selecting, check recognition, report form disposing and handwritten manuscript auto-input. However, for Chinese characters are so much different when written by different people, the research on off-line Chinese character recognition develops evidently slower than many other character recognition researches.The system is mainly used for handwritten manuscript auto-input, The main work on this is as follows:1、In the step of preprocessing, smoothing, binarizing and normalizing is completed. Especially, different binarizing methods are used according to different paper backgrounds. Such as the examination partition methodology is used on character pictures of blank background and double-threshold methodology is used on the pictures with frame lines.2、Lots of researches and tests are done on Chinese characters segmentation, and blurry rules judgment segmentation method is chosen for the application of this recognition system is Chinese character auto-input.3、Some statistical and structural feature extraction methods are introduced and two new ones are used for two kinds of characters, such as elastic meshing directional features are proposed for handwritten printed characters and thinned Chinese character stroke features are used for normative general characters. Besides, element tracing method and cross spot stroke segment combination method are proposed to improve inflexion extraction and stroke segment combination.4、In the stage of recognition, two kinds of pre-classification methods are also used to do with the two kinds of above-mentioned features. We use error equilibrium distance to judge elastic meshing directional features, and a new stroke matching method is used for the other kind of feature. Besides, an improved classifier structure is performed to save the time to 50%.There are 2000 different Chinese characters as training samples and testing samples, each character has 6 samples. They are divided into two classes according to their styles. The testing results show that the correct rate of the first class is 90% and the second class is 85%.

  • 【分类号】TP391.43
  • 【被引频次】4
  • 【下载频次】351

