节点文献

印刷体数学公式符号切分的研究

The Study of Symbol Segmentation in Printed Mathematical Formulas

【作者】 张艳

【导师】 田学东;

【作者基本信息】 河北大学 , 计算机应用技术, 2008, 硕士

【摘要】 目前主流OCR (Optical Character Recognition,光学字符识别)技术虽然能够高速、自动地将印刷体文字信息输入计算机,但对于结构复杂、符号多变的数学公式仍然无能为力。数学公式是科技文献的重要组成部分,因此,数学公式识别问题已经成为模式识别领域炙手可热的课题。印刷体数学公式识别系统包括公式抽取、公式符号识别、公式结构分析和公式重构四个组成部分。其中,数学公式符号识别是公式识别系统的核心部分,分为符号切分和符号识别两个阶段,而符号切分又是关键的一个环节。本文针对印刷体数学公式符号切分展开研究。首先设计并实现了一个能够处理公式二维嵌套结构的符号切分算法;然后基于识别结果以及粘连符号的特点,给出了一种基于轮廓特征的印刷公式粘连符号切分方法,通过检测粘连符号的凹凸轮廓以及轮廓的宽高比,并结合给出的粘连符号切分算法对粘连符号进行切分。通过对不同印刷质量文档的实验表明,本文设计的符号切分方法能够取得较高的切分正确率和令人满意的处理速度。

【Abstract】 At present, the OCR system can input the ordinary text into computer quickly and automatically, but they cannot deal with mathematical formulas because of the complicated structures and various symbols. Mathematical formulas are the important components of the scientific documents. Therefore, mathematical formula recognition has become a hotspot in the area of pattern recognition.Mathematical formula recognition typically consists of four major stages:mathematical formula extraction, mathematical symbol recognition, formula structural analysis and formula reconstruction. The mathematical symbol recognition is the core of the system, it consists of two steps:symbol segmentation and symbol recognition. The symbol segmentation is a key part, we do research work on the segmentation of printed formulas. Firstly, an algorithm that designed and implemented to deal with a nested structure of the two-dimensional symbol segmentation to handle mathematical formulas symbol segmentation, secondly, a method which is using contour feature is designed for segmenting touching symbols in mathematical formulas based on recognition result and touching characteristic, through examining the concave-convex contour of touching symbols and the beam-to-depth ratio of the contours, and using the algorithm to segment touching symbols. The experiments on different quality documents show that these methods can obtain satisfactory segmentation accuracy with a high speed.

  • 【网络出版投稿人】 河北大学
  • 【网络出版年期】2011年 S1期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络