节点文献
脱机印刷体维吾尔文字识别特征选择和分类器设计方法的研究
The Research of Feature Selection and Classifier Design for Printed Offline Uygur Character Recognition
【作者】 贾建忠;
【导师】 龚声蓉;
【作者基本信息】 苏州大学 , 计算机应用技术, 2008, 硕士
【摘要】 文字识别是模式识别的一个重要应用方向。目前,阿拉伯文字及以阿拉伯字母为基础的维吾尔文字识别技术研究相对滞后。发展维吾尔文字识别技术对研究我国西部地区少数民族历史文化、宗教信仰、古代文献和文字资料有重要意义。本文在对维吾尔文的特点和识别方面的难点进行详细分析的基础上,从文档图像预处理、文字切分、特征提取、分类器设计等方面对印刷体维吾尔文的识别技术进行了细致地研究和实验,研究成果主要有以下几个方面:1.对脱机印刷体维文的文档图像预处理方法进行了深入探讨,通过实验实现了图像二值化、平滑去噪、细化、归一化等处理,为进一步识别文字作出了准备。2.通过研究维文和拉丁文、中文等文字特点的不同,提出了先切分文字行、再切分字词、最后识别字母的识别方法和思路,并进行了大量的相关实验。也提出了使用隐形马尔可夫模型的整体识别方法的思路和实现设想。3.根据维吾尔文书写特点,提出了多种基于二值字符图像的特征提取方法:如:模板特征、环特征、连通区域特征、附加笔划特征、笔划密度特征、投影变换系数特征等,并将其作为BP神经网络分类器的输入特征进行训练。4.在字符图像预处理和字符特征提取的基础上,设计并实现了基于BP神经网络模型的维吾尔文字符识别分类器。该分类器通过样本集训练实验获得了收敛的结果并在维文字符识别实验中获得良好效果,印刷体字符识别率达到了98.21%。
【Abstract】 The Recognition for character is a major application direction of pattern recognition. At present,The Arabic character, and the Uighur character based Arabic Letters recognition technology research has lagged behind, which is determined by its own characteristics. The development of the Xinjiang Uyghur character recognition technology is important to study the minority history, culture, religion and Preserve the text information and ancient literature of minorities in western China. At the same time the research have some reference value to the Arabic character Recognition.Based on the detailed analysis of the characteristics of the Uyghur and the difficulty in Recognition ,In this paper,We do some research and experimentation in image pre-process- ing, text segmentation, feature extraction, classification, and other aspects of the printed Uighur recognition technology. The important research is focus on the Uygur character recognition using BP neural network classifiers Design and Implementation. The main search results are the following:1.Printed on offline Uyghur character image pre-processing method has conducted in-depth study.We completed the binarization, smoothing and the normalization of the original image, laid the foundation for the further work.2.By comparison the different characteristics of Uighur, English and Chinese text, A Methods have been proposed By using First division line, then separate the words, finally identification letters, and a large number of experiments are completed. This paper also proposed the method of overall realization using Hidden Markov Model(HMM).3.According to Uighur writing characteristics,a variety of feature extraction methods have been introduced,such as Template features ,Aspect Ratio,Loop, Euler,Link,strokes in different directions. the combination of these characteristics Provide input vector to the ANN Classifier.4.This article explores the use of neural network model to achieve the Uighur character recognition method and the use of MATLAB toolbox for BP neural network classifiers to achieve the specific design process. We have a good experiment result using the of Ann classifier, the printed character recognition rate of 98.21 percent.
【Key words】 image preprocessing; feature extraction and selection; character recognition; Samples; neural network;