节点文献

手写体汉字的计算机识别研究

Research on Chinese Handwritten Character Recognition

【作者】 赵云

【导师】 陈庆虎;

【作者基本信息】 武汉理工大学 , 信号与信息处理, 2004, 硕士

【摘要】 手写体汉字计算机识别是模式识别领域最难解决的问题之一。在我们所从事的《计算机笔迹鉴别》和《网络化笔迹检索》项目的研究与应用中,经常需要从选定文稿中挑选出常见字以备鉴定,然而,从大段的手写文稿中挑选出所需要的字迹是一件繁琐的事情,工作量大、容易出错。为了提高软件的鉴别效率及实现软件的自动化、智能化,有必要对其中的手写体汉字实现计算机自动跟踪识别。手写体汉字的识别是尚未攻克的难题,相关的资料有限,在短期内试图完全解决这一难题是不大可能的。然而,本课题研究的是部分常用汉字的识别,与传统意义上的大数量集的汉字识别有所区别,这为该课题的成功实施提供了可能性。 本文的主要研究内容为:文字识别的原理和方法,汉字图像的预处理,汉字识别的分类算法,神经网络在汉字识别中的应用,常用汉字识别系统方案设计与开发。 文字识别的原理和方法介绍了文字识别领域采用的一般方法和策略——基于数学特征的统计决策法和基于结构特征的句法分析法。汉字图像的预处理包括对识别文稿进行平滑去噪、图像二值化、倾斜校正、行字切割、归一化以及细化。汉字识别的分类算法包括对汉字进行粗分类和细分类,在不同的分类方法中各采用两种互补的特征抽取算法,并相应地在识别上采用不同的策略。神经网络在汉字识别中的应用包括研BP神经网络及其改进算法、设计汉字识别所需要的BP神经网络,即在神经网络的输入层、中间层、隐含层采用64—20—4的结构,并利用Matlab6.5对所设计方案进行仿真和验证。 本项目在汉字识别领域最新成果的基础上设计并开发了三级识别策略的汉字识别系统。第一级,使用传统的外围特征法和投影变换系数法将待选字进行粗分。第二级,使用笔画密度特征和比画四分解的弹性扇形网格特征进行细分。第三级,结合当前最流行的BP神经网络算法对结果进行最后的确认,最终输出结果。 本系统采用Delphi6.0进行软件开发,对写字较为规范正规的手写体,其识别率达到98%以上(10候选),取得了令人满意的结果。

【Abstract】 The problem of Chinese handwritten character recognition by computer is thought of one of the most difficult problems in the field of pattern recognition. In our project of "Computer Chinese Handwriting Identification" and "Chinese Handwriting Sort in Internet", we always need to pick some special hand script from the manuscript for discrimination. But this work is very troublesome, uninteresting and easy to make mistakes. To improve the automation and intelligence of the software, we need to implement the function of auto-pick scripts. That is the task of handwritten character recognition. But the problem is too large and difficult to be solved all. Even though, in our research, we just want to pick a small quantity of special characters from the manuscript. It supplies us a possibility to solve the problem successfully.The main research content of this thesis include: the basic theory and method of character recognition, the pre-work of script image, the classification algorithm, the research of neutral network, and the system design of usual character recognition.The thesis introduces two basic thinking in field of optic character recognition (OCR), which is statistical- decision algorithm based on math characteristic of character and structure-decomposition algorithm based on physical characteristic of character. The thesis introduced 6 steps of OCR pre-work, which is getting rid of noise, image binary, image incline rectify, image incise, image standardize and image thinning, the classification algorithm include rude classification and particular classification. We adopt two differentcharacteristic extracting methods and recognition algorithms accordingly. Thirdly, we researched the neutral network algorithm and its improvement algorithm, designed a BP neutral network, which could apply in Chinese handwritten recognition. In the network, the input node is 64, the middle is 20 and the output is 4.we also use matlab train and simulate the designed network. Finally, we designed software, which combines all the correlate theory and method list above to validate the thinking.In this project, a new handwritten character recognition system has been designed successfully, which has 3 levels. In the first level, it used the tradition periphery characteristic to class approximately. In the second level, it used stressful grids characteristic to class accurately and in the third level, used neutral network tools and gives the last output.The software uses Delphi 6.0 do experiments, from the effect of experiment. We could see, its correct recognition rate has reach up to 98 %( 10 candidates) this result is satisfying and encouraging.

  • 【分类号】TP391.4
  • 【被引频次】11
  • 【下载频次】537
节点文献中: 

本文链接的文献网络图示:

本文的引文网络