

Research of the Technology of Off-line Handwriting Nüshu Character Recognition

【作者】 万晨

【导师】 王江晴;

【作者基本信息】 中南民族大学 , 计算机应用技术, 2011, 硕士

【摘要】 女书是世界上最具性别意识的文字,有着重要的非物质文化遗产保护价值。到目前为止,女书文献主要依靠手工抄写的方式传承,而随着女书传人的相继去世,女书文献的收集和整理变得更加困难,女书文化濒临消失。针对此问题,本文将脱机手写文字识别技术应用到女书文献的信息化上,为保护和发扬女书这组中华民族珍贵的文明基因尽自己的一份力量。本文在对目前脱机手写文字识别算法进行详细分析的基础上,针对女书自身的特点提出了一种脱机手写女书文字识别方案。从方案的设计着手,详细分析了脱机手写女书文字识别的工作流程,各部分的功能和常用算法,将周边方向贡献度特征提取算法应用到女书文字的特征提取上,并提出了一种改进的笔画密度特征提取算法和一种三级距离分类识别算法;设计并实现了一个实用的女书识别系统。本文的主要工作和特色如下:1)针对女书文字的样本,采用平滑算法和二值化算法去除样本图像中的方格噪声和背景,并根据女书样本中文字分布的特性,采用行合并的切分算法切分女书文字。最后将切分出的女书文字归一化成统一规格。2)分析了两种笔画密度特征提取算法的特点以及它们应用在女书文字上的不足,将周边方向贡献度特征提取算法应用到女书文字的特征提取上,并根据女书文字倾斜的特性,提出了一种改进的笔画密度特征提取算法。3)对现有多级距离分类器进行了分析,针对欧式距离在识别过程中的不足,设计了一种三级距离分类器。分类器的一级分类采用Manhattan距离,二级分类和三级分类采用误差均衡距离,该分类器具有Manhattan距离分类速度快和误差均衡距离分类能够使女书文字特征中稳定的部分得到突出,不稳定的部分被抑制的优点。4)采用本文提出的改进笔画密度特征提取方法、三级距离分类器等算法,设计并实现了一个脱机手写女书文字识别系统。用系统进行了仿真实验,对实验结果进行分析和比较。

【Abstract】 Nüshu is the world’s most gender character, and has an important value of the intangible cultural heritage protection. So far, the inheritance of Nüshu documents mainly relies on the way of manual transcription. With the inheritance persons of the Nüshu died successively, the culture of Nüshu is endangered as collecting and compiling the literature of Nüshu become more difficult. In order to solve this problem, handwritten character recognition technology is used in this dissertation to informationize the literature of Nüshu, so as to protect and promote Nüshu which is precious Chinese civilization.An off-line handwritten Nüshu character recognition program for the features of Nüshu is provided in this paper, based on the detailed analysis on the current off-line handwritten character recognition algorithm. The study started from the design of the scheme. The work processes of the off-line handwritten Nüshu character recognition, functions of each part and the frequently-used methods are carefully analyzed. The peripheral direction contribution feature extraction algorithm is applied to the feature extraction of Nüshu. An improved stroke density feature extraction algorithm and a three-level classification algorithm are proposed as well as a practical system of Nüshu is designed and implemented.The main work and the features are as follows:1) According to the samples of Nüshu, smoothing algorithms and binary algorithm are used to remove the background and noises in the sample images. According to the distribution characteristics of Nüshu characters which is in samples, used the combined with the line segmentation algorithm. Finally the size of the Nüshu characters is normalized.2) The features and the defects of using in Nüshu characters of the two stroke density feature extraction algorithm are analyzed.The peripheral direction contribution feature extraction algorithm is applied to the feature extraction of Nüshu as well as an improved stroke density feature extraction algorithm is proposed according to the tilt features of Nüshu characters.3) The current multi-level classifier is analyzed, a three-level distance classifier is designed according to the defects of Euclidean distance in the recognition process. One-level classifier uses Manhattan distance classifier as well as two-level and three-level use error balanced distance. This method has the advantage of the Manhattan distance classifier of high speed and the error balancing distance classifier can make the stable characteristics of Nüshu characters highlighted and the instable parts inhibit.4) An off-line handwritten Nüshu character recognition system is designed and implemented based on the improve stroke density feature extraction method and three-level distance classifier algorithms. Simulation experiment of the system is carried out and the experiment results are analyzed and compared.
