

Study on Recognition of Off-line Similar Handwritten Chinese Haracters Based on Support Vector Machines

【作者】 封筠

【导师】 杨扬;

【作者基本信息】 北京科技大学 , 计算机应用技术, 2005, 博士

【摘要】 虽然脱机手写体汉字识别技术具有广阔的应用前景,但是由于脱机手写体汉字自身所特有的复杂性,使得识别系统的实现具有很大的难度,目前还没有十分成熟的产品。研究表明相似汉字的存在是影响系统识别率低的主要原因之一,因此我们必须花大力气解决手写体相似汉字的识别问题。鉴于支持向量机在小规模细分类问题上的优势,本文以脱机手写体相似汉字为对象,深入研究了基于支持向量机的手写体汉字识别中的若干核心问题,做了以下几方面具有创新性的工作:首先,基于核函数的黎曼几何分析,提出了一种SVM自动模型选择方法。该方法先利用基于粗网格与模式搜索相结合的全局优化搜索算法,依据分类器性能评价准则来获得优化的SVM模型参数;之后再采用文中所提出的新保角变换,对核函数进行数据依赖性改进,进一步提高分类器泛化能力。其次,研究了两种不同形式下的特征选择方案:①针对单目标特征选择问题,提出了一种基于单目标改进GA算法与交叉验证SVM分类的特征选择方案;②针对多目标特征选择问题,提出了一种基于Pareto优势的MOGA算法与SVM分类的特征选择方案。这两种方案均属于利用SVM分类器反馈信息的Wrapper求解方法,能在不降低系统泛化性能的情况下,获得维数较小的特征向量。然后,针对DAGSVM分类器的存在问题,提出了一种新的基于结构优化的模糊多值DAGSVM分类器。根据分类器性能评价准则,给出了训练阶段离线获得结构优化DAGSVM的算法;在识别阶段,模糊多值DAGSVM分类器利用模糊隶属度函数与平均算子获得分类识别结果。与其它基于组合策略的多值SVM分类器相比,该分类器具有更高的识别精度和识别速度。最后,在分析客观相似汉字的相似特性基础上,建立了一个较为实用的手写体相似汉字样本库,为今后的进一步研究奠定了基础;提出了一种基于小波弹性网格提取特征、利用遗传算法选择特征和SVM分类相结合的手写体相似汉字识别方案,实验结果表明了该识别方案的可行性和有效性。

【Abstract】 Off-line handwritten Chinese characters recognition technology can be widely used inmany areas, but it is very difficult in realizing the recognition system because of its complexityand until now there is not perfect product. Investigation shows that the existence of lots ofsimilar characters is a key factor affecting recognition rate. Therefore, much deeper researchesshould be made to solve the difficult problem of similar handwritten Chinese characterrecognition. Since the approach based on support vector machines (SVM) is fit for smallsamples and small categories recognition problem, the dissertation investigates some keyproblems of SVM using in similar handwritten Chinese character recognition. The contributionsof this dissertation are presented as follows.Firstly, an approach for automatic model selection for SVM classifier is proposed based onthe Riemannian geometry analysis of kernel function. The optimal model parameters areobtained based on the evaluation criterion of model selection and the global search algorithm ofcoarse grid combined with pattern search. Then, the novel conformal transformation presentedin this dissertation is adopted and the kernel function is modified by the transformation in adata-dependent way. And the experimental results show remarkable improvement of thegeneralization performance of the classifier.Secondly, two different schemes for feature selection are investigated. One is based onsingle objective improved genetic algorithms and cross-validation-based SVM classification forsingle objective feature selection problem. The other is based on pareto-optimality-based multi-objecitve genetic algorithms and SVM classification for multi-objective feature selectionproblem. They all belong to the wrapper methods, which utilize the feedback information fromSVM classifier. The feature vector with low dimension can be found without any loss to thegeneralization performance.Thirdly, aiming at the deficiency of DAGSVM classifier, a novel fuzzy multi-classDAGSVM classifier based on optimal structure is proposed. According to the performance evaluation criterion, the algorithm for optimizing the structure of DAGSVM is introduced in thetraining stage. The final recognition result is obtained by fuzzy multi-class DAGSVM classifierusing the fuzzy membership function and average operator in the testing stage. Theexperimental results show that the recognition precision and rate of the novel classifier are allbetter than those of pair-wise SVM classifier with other combination strategies.Lastly, a practical sample library for similar handwritten Chinese characters is built on theanalysis of the similar property of objective similar handwritten Chinese characters, whichprovides the foundation for the future research. A recognition approach for similar handwrittenChinese characters is presented based on extracting feature vector by wavelet transformationand elastic meshing technique, selecting feature subset by genetic algorithms and classifying bySVM. The experimental results confirm the effectiveness and practicality of the approach.
