节点文献

中文版面分析的研究

A Study on Chinese Document Layout Analysis

【作者】 张志彬

【导师】 田学东; 郭宝兰;

【作者基本信息】 河北大学 , 计算机应用技术, 2002, 硕士

【摘要】 版面分析作为文字识别系统的预处理部分,其准确性直接影响文字的识别率。本文针对复杂的中文版面,提出了一个基于模糊连接度和识别特征的中文版面分析方法,完成了一个图像输入、倾斜校正、版面图文分割的过程。版面图文分割主要采用自底向上的办法,利用连通域搜索算法检测出文本页面上的所有连通基元,通过对连通基元的四个方向上的连接度进行模糊化处理来决定文字行、列的合并,并对在文字行合并时影响较大的标点符号采用先识别后合并的方法。为了减少时间开销,在计算和合并过程中采用局部搜索策略。实验结果表明,该方法对印刷质量比较好的中文版面具有较理想的分割效果。

【Abstract】 The layout analysis is part of important pre-processing of character recognition. The accuracy of Layout analysis has direct effect on efficiency of character. We provide a Chinese layout analysis method base on fuzzy connectedness and recognition features for complex document layout. This is a process including the input document images ., skew correction and Texts/Graphics segmentation. The bottom-up approach used in Texts/Graphics segmentation. All the connected units in the page are detected by search algorithm of connected region. The row-column mergence of the character is defined by fuzzy connectedness of the connected units at four orientations. The combination of punctuation we adopt the method of combination behind recognition due to great effect of mergence. In order to reduce time overhead, the local searching strategy is used in the process of calculation and mergence. The result of experiment has shown that this method can analysis belter prinled-quality document with satisfactory segmentation.

  • 【网络出版投稿人】 河北大学
  • 【网络出版年期】2004年 02期
  • 【分类号】TP311.52
  • 【被引频次】16
  • 【下载频次】348
节点文献中: 

本文链接的文献网络图示:

本文的引文网络