

Research on Chinese Handwriting Identification Algorithm

【作者】 张伟

【导师】 白雪冰;

【作者基本信息】 东北林业大学 , 农业电气化与自动化, 2009, 硕士

【摘要】 采用图像处理技术和模式识别理论研究了文本独立的计算机离线笔迹鉴别的方法,建立了一套能够反映笔迹特征的纹理参数体系,根据这些参数建立了笔迹鉴别的模式识别方法,为计算机笔迹鉴别打下理论和技术基础。收集了60个人的笔迹样本,通过扫描仪转化为数字图像,建立了包含360(60×6)幅图像的笔迹鉴别样本库。对笔迹图像的预处理过程包括纸张背景颜色去除,狄度化、消噪和二值化,归一化等步骤。采用图像像素取色器去除背景颜色;分析了三种灰度化方法,确定采用加权平均法进行灰度化;研究了两种消噪方法,根据实验确定采用中值滤波法消除噪声;采用S.Watanabe方法进行二值化;归一化处理包括倾斜校正、去除标点符号、字符分割、字符尺寸归一化和文字拼接,分析了三种尺寸归一化方法,通过实验比较确定采用单边定界法。分析了四种常用的纹理分析方法,确定采用Gabor变换进行笔迹纹理分析。研究了Gabor变换的特点和性质,并根据最优滤波器设计原则设计了Gabor滤波器。通过设置三种不同的Gabor滤波器参数获取了三套不同的笔迹纹理特征参数,其中第一套16个特征,第二套48个特征,第三套24个特征。研究了不同核函数对支持向量机分类性能的影响。通过实验比较了k-近邻、BP神经网络和支持向量机的分类性能,针对支持向量机参数和核函数参数选择问题,采用了遗传算法进行参数优化,在给定的参数范围内得到了使支持向量机分类性能较好的参数。采用基于遗传算法优化参数的支持向量机(高斯径向机核函数)作为分类器对未知样本进行鉴别。研究了模式识别中的特征选择方法,采用最近邻分类正确率作为特征选择的性能评价函数。通过实验比较了模拟退火法和遗传算法两种优化方法的搜索性能,确定采用基于最近邻分类器分类正确率——遗传算法的特征选择方法。对比了3套特征参数在特征选择前后的分类结果。最终确定了表征笔迹纹理的参数体系和模式识别方法。最后对大样本情况下的笔迹鉴别做了探讨。本研究根据笔迹纹理特征对离线文本独立的笔迹鉴别,其结果能够为计算机代替人进行笔迹鉴别提供一定的参考,丰富了图像处理领域关于笔迹分析和鉴别的方法。

【Abstract】 In this study, image processing and pattern recognition theory of computer off-line handwriting text independent method is discussed to identify a set of texture features to reflect the parameters of the handwriting system, as well as the realization of these parameters in accordance with the pattern recognition method to identify the handwriting. The handwriting for the computer provides a theoretical basis for identification, and lays a solid foundation for the theory and technology.60 samples of handwriting are collected and changed into digital images through a scanner, building a sample database of the handwriting identification containing 360 (60 x 6) images. The paper is divided into some steps such as pre-processing to remove background color, gray, de-noising, binarization and normalized. The background color is removed by the screen color device; three gray-scale methods are analyzed to determine the method using the weighted average of gray; two types of de-noising method are studied, according to the experiment to determine the use of median filter to eliminate noise; S. Watanabe methods are used for binary; normalized includes tip-tilt correction, removal of punctuation, character segmentation, character normalization size and the letter of Mosaic, and an analysis of three methods of size normalized by the experiment will be compared to determine the use of unilateral bound method.Analyzing four common methods of texture analysis, Gabor transform is choosed to identify handwriting using texture analysis. Studying the characteristics and nature of Gabor transform, the Gabor filter is designed with the principles of optimal filter. By setting three different Gabor filter parameters the study obtains three different features of handwriting texture parameters, the first set of 16 features, the second set of 48 features, and the third set of 24 features.The paper studies the different kernel function for SVM, and compares the classification performance of k-neighbor, BP neural network and SVM through the experiment. As to the parameters’selection for SVM and kernel function, the paper uses the genetic algorithm to optimize parameters, then gets better classification performance parameters of SVM in a given framework. This study identifies SVM (the Gaussian RBF kernel function) which is based on genetic algorithm to optimize its parameters as classifier to identify unknown samples.The paper studies the methods of feature selection in pattern recognition, and uses nearest neighbor classification accuracy as the evaluation criteria for feature selection. Through comparing with the searching performance of the two kinds of optimization method of genetic algorithm and simulated annealing, the paper uses the feature selection methods based on neighbor classifier classification accuracy—genetic algorithm.Comparing with three sets of characteristic parameters in the classification results before and after feature selection, the study ultimately sets the parameters for texture characterization of handwriting recognition systems and methods. Finally, the paper gives a brief discussion on identifying the handwriting in case of large samples.The study identifies the off-line independent handwriting text based on texture feature The result can provide a powerful handwriting identification reference as the computer substitute for human, enriches the field of image processing on the handwriting analysis and identification method.


