

Research and Implementation of Feature Extraction Based on Offline Handwritten Chinese Recognition

【作者】 刘伟

【导师】 朱宁波;

【作者基本信息】 湖南大学 , 计算机软件与理论, 2007, 硕士

【摘要】 汉字识别是用计算机自动辨识印刷在纸上或人写在纸上的汉字,学科上属于模式识别和人工智能的范畴。汉字识别涉及到模式识别、图像处理、人工智能、形式语言与自动机、模糊数学、组合数学、信息论、中文信息处理等学科,也涉及到语言文字学、心理学、仿生学等,是一门综合性技术。汉字识别是一种难度非常大的模式识别。这是因为:从客观上讲,汉字是一种特殊的模式集合,其模式种类很多,结构非常复杂,有的模式又十分相似,加上印刷质量与干扰的影响,以及人们在书写时的随意性使字形不够规范等原因,都使得汉字字符的识别十分困难。首先,预处理在手写体汉字识别中占有重要地位。本文讨论了手写体汉字的预处理方法,实现了传统的二值化、平滑算法,实现了一种基于图像有效区域的密度均衡原则的非线性规范化方法,它较之其他几种方法更能有效地减小同类字符之间的差异,更有效地提高了手写体汉字的识别率。在特征提取方面,本文提出一种模糊子笔画抽取方法,解决了因无限制手写体笔画随意性而使得抽取的子笔画不稳定的问题。计算字符边缘点“横”、“竖”、“撇”、“捺”的模糊子笔画属性特征,并将其与模糊网格相结合,生成模糊子笔画统计特征。此外,在特征提取方面,还提出了一种基于子块及其相关模糊特征的提取方法。这种方法既考虑了汉字笔画的分布特点,又很好地考虑了汉字拓扑结构上的相关性,是对人认知汉字机理的一种模仿,这对识别书写风格差异大、随意性强、结构变形大的手写体汉字,是一种很好的方法。最后,本文介绍了一个机器阅卷系统。包括其应用环境、主要功能、使用的主要技术。论文作者主要负责答案填涂区域的处理,并用本文提出的方法对姓名进行了识别实验。

【Abstract】 Chinese character recognition is automatically recognizing Chinese characters printed or written on paper with the help of computer. It is pertain to pattern recognition and artificial intelligence. It deals with pattern recognition, image processing, artificial intelligence,formal language and automata,fuzzy mathematics, compounding mathematics,informatics, Chinese information processing, as well as linguistics,psychology,bionics. It is a universal technology.Chinese character recognition is a kind of pattern recognition with great difficulty. On one hand,Chinese characters are a special pattern set,which has many patterns,complicated structures. Some patterns are very alike. Poor quality of printing,impact of voice,and irregular shape of written characters make their recognition even more difficult.Firstly, preprocessing plays an important role in handwritten Chinese character recognition. In the step of preprocessing, traditional thresholding, smoothing is implemented. In addition, a modified nonlinear normalization method based on density equalization of the exact area is implemented, which narrows the difference within the same class, compared with other normalization method. As a result, the recognition rate is increased greatly.A fuzzy sub-stroke extraction method is proposed to resolve the unsteadiness because of the unconstrained written fashion. First calculating the attribution feature of boundary point related to the four fuzzy sub-strokes—horizontal、vertical、left diagonal and right diagonal, then combing fuzzy mesh with fuzzy sub-stroke attribution feature of boundary points to obtain the fuzzy sub-stroke statistical feature of a Chinese character.In addition, an approach of block feature and its related fuzzy feature based on elastic mesh is presented. The method simulates the mechanism when people recognizing Chinese character. It is an effective method, especially being good at recognizing handwritten Chinese character which is written in different style, with great distortion.Finally, a practical processing system of paper check developed by us is introduced including the applying environment、primary function and the main technology used in the system.My main job is to process the painting answer area and use the methods presented in this paper to recognize the name character.

  • 【网络出版投稿人】 湖南大学
  • 【网络出版年期】2007年 05期
  • 【分类号】TP391.43
  • 【被引频次】12
  • 【下载频次】428

