

Research of Constrained Handwritten Optical Character Recognition

【作者】 杜彦蕊

【导师】 郭连骐;

【作者基本信息】 哈尔滨工程大学 , 信号与信息处理, 2003, 硕士

【摘要】 本系统的研究对象为限制性手写体字符(包括10个阿拉伯数字和52个英文字母的大小写,共62个字符)。本文研制的CC—OCR系统完成了字符从扫描输入到计算机识别的全过程。 本文提出并实现了基于特征编码的多级分类识别方法,通过给字符抽取足够多的有效的特征并给特征编码实现第一级分类,对于第一级分类后仍不能区分的字符,再进入第二级分类用模板匹配的方法最终达到区分的目的,这种方法的重点在第一级分类阶段。实验结果表明这种基于特征编码的多级分类识别方法是可行有效的。 在预处理阶段,本系统对字符点阵进行了预处理,为以后的特征提取和识别打下了良好的基础。在第一级分类阶段,本文提出了边沿表极值差特征、左边沿表间断特征、改进的宽度特征、针对所区分的字符在不同局部范围取交截特征的平均值与阈值比较等特征,这些特征与已有的一些特征相结合,较好的实现了在第一级分类阶段对字符的分类能力。 本系统的硬件部分由扫描仪与计算机组成,实现程序由C和VC++6.0完成。

【Abstract】 The research object of this system are constrained handwritten characters (including 10 Arabic numerals, 26 capital English letters and 26 small English letters , 62 characters aggregately). The CC-OCR system developed by the author can complete the process from the characters scan input to the computer recognition.This dissertation brings forward and realizes the multilevel classifiable method which is based on characters coding. Above all, this method realizes the first-grade classification by extracting enough effective characters from characters and coding them, to the others which coundn’ t be recognized by the first-grade classification, the method will adopt the second-grade classification using template matching to recognize these characters. The emphasis of this method stands on the first-grade classification phase. The experiment proves that this method is feasible and effective.In the pre-processing phase, each character is fed into a pre-processor, this makes feature extraction and recognition easy. In the first-grade classification phase, the dissertation puts forward border-table subtract of maximum and minimum feature left-border-table intermission feature improving width feature crossing amount average feature, these features combines with some existing features, realizes the ability of classification in the first-grade classification phase better.This system is composed of scanner and computer. This program is completed using C and VC++6. 0.

  • 【分类号】TP391.4
  • 【被引频次】4
  • 【下载频次】308