节点文献

数字视频中文本的提取方法研究

Research on Text Extraction in Digital Video

【作者】 王振

【导师】 魏志强;

【作者基本信息】 中国海洋大学 , 物理海洋学, 2011, 博士

【摘要】 对于视频内容的分析与检索已成为当前视频信息研究领域的一个热点。由于视频中包含的文字信息与视频内容关系密切,可以为视频内容理解与检索提供重要线索,因此如何快速、准确的提取视频中文本信息也就成为一项非常有意义的研究方向。除此以外,视频文本提取技术通过与各种移动数码设备(数码摄像机、数码相机、PDA、手机等)结合,在自动翻译、盲人导航、机器人视觉、智能交通等方面也发挥了越来越大的作用,并逐渐成为了研究人员关注的热点问题。从视频中提取文本信息并不是一件简单的事情,由于视频图像中的文本往往存在于复杂的背景中,同一幅图像中可能含有不同字体、颜色、大小和排列方式的文字,因此对于视频中文本检测、定位和分割具有很大的难度。本文对于视频文本提取框架中的若干关键问题,如文本定位﹑跟踪﹑增强以及实际应用(新闻故事自动分割、道路交通标识牌文字识别系统)开展研究。研究内容主要如下:提出了一种综合灰度形态学和小波多尺度分解与重构算法的文本定位方法。首先结合形态学与小波分析在边缘检测方面的优点,提取出视频帧边缘像素,然后通过“基于密度”的区域增长算法将边缘像素合并成为候选文本区。最后采用基于BPSO算法进行特征选择及SVM参数同步优化的分类器对候选文本区进行确认。本方法有效克服了单独优化特征或单独优化分类器参数的缺陷,取得较好的分类效果。提出一种基于边缘角点与改进Hausdorff距离为判定准则的静止和线性运动文本的跟踪算法。首先将边缘算子提取的二值图像经去噪、细化处理后,以提取的边缘角点为特征点集合,用改进的Hausdorff距离度量为判定准则,通过点模式匹配法跟踪文本区域在相邻视频帧序列中的位置。实验结果显示,点模式匹配的跟踪算法比图像整体像素匹配的算法跟踪精度更高。由于该算法不必对每个视频帧都进行文本定位,从而大大提高了系统效率。在文本跟踪的基础上,用基于多帧融合思想的前景/背景识别算法提取视频文字笔画并作OCR识别。提出了一种融合视频中的标题字幕信息以及音、视频等多模态信息的新闻故事单元分割方法,并实现了一个新闻故事分割、浏览和检索的原型系统。首先根据第二、三章的算法实现对新闻标题文本的定位、跟踪与分割,并在镜头分割的基础上,根据混合高斯模型(GMM)与KL差异法完成播音员和非播音员音频镜头的识别,最后结合新闻视频节目的特殊结构知识完成对新闻节目故事单元的自动分割。介绍了一种视频文本提取算法在辅助驾驶系统中的应用,通过对道路标识牌上的文字提取,从而提供给驾驶员在公路上的导航,如所处位置、方向、限速等信息。算法首先基于颜色信息来定位特定颜色的道路标识牌,然后经过仿射变换,基于笔画算子的种子区域增长算法进行交通标识牌文字的定位、分割和提取。

【Abstract】 Nowadays, the speedy growth of video resources bring about an urgent demand for efficient Video Information Classification and Retrieval system which could help customers acquire interesting video or video clip from huge amounts of unstructurized video data. Among these techniques, text extraction method has become a very meaningful research topic because the text in frames have close relationship with the video content. Besides, many mobile devices have been equipped with high-performance camera, such that images and videos containing text can be easily captured when necessary. If these texts can be automatically discovered, many utilitarian applications (e.g. translation, special service for blind person, machine vision and intelligent traffic system) can be provided for users.However, the embedded text in video frames have different size, style, direction and arrangement, as well as low contrast and complex backgrounds which make the text extraction problem very complicated.?This dissertation focuses on the research in the crucial problems of video text segmentation, including video text location in single video frame, multi-frame video text tracking, video text enhancement, video text segmentation application (news video story segmentation, text detection of road signs system).The main works of this dissertation are as follows:An edge detection approach combining gray-scale mathematical morphology with wavelet transform is proposed for coarse filtration first.This edge detection method combines the advantages of both wavelet transform and morphology methods together to fuse the two edge information obtained by different method,thus suppressing effectively the noises with the consecutive and clear edges kept up. Next, a density-based region growing method is used to join these pixels into text regions. Finally, A algorithm based on binary particle swarm optimization was presented and applied to optimize feature selection and parameters of SVM simultaneously which is used to identify true text from the candidates. Experimental results show that this approach can fast and robustly detect text lines under various conditions.A video text tracking and text extraction method under complex background is proposed. On the basis of comer detection of curvature function,a point matching method is introduced to track text objects for which a modified Hausdorff distance is used to find and register the corresponding text block in video frames. The algorithm can avoid detecting text in every video frame which improves the system efficiency a lot. Next, a multi-frame-based foreground/background recognition algorithm is proposed to extract text strokes for optical character recognition. The efficiency and robustness of the point matching method for video text tracking and the text extraction algorithm are proved by objective and thorough experiments on TV serials and movies.A novel news story automatic segmentation scheme based on video,audio and text information is proposed. Firstly, the shot boundaries for news video is detected, then the topic-caption frames are identified to get segmentation cues by using text detection and tracking algorithm in previous chapter. Next, depending on the Gauss Mixture Model and KL divergence method, every video shot is identified as announcer or un-announce type by using voice recognition. Finally, the news story unit segmentation is carried on under the special structure knowledge of news program.A fast and robust approach for the extraction of text on road signs based on color and stroke is proposed.First, a novel color model derived from Karhunen-Loeve(KL) transform was applied to find all possible road sign candidates. Then, affine transformation was performed to restore road signs to let every road sign seems to be vertical to the camera optical axis which can improve the accuracy in detecting texts embedded in road signs. Finally, mathematical morphology and region growing algorithms were used to obtain a clearer binary picture which is sent to OCR software. Experimental results demonstrate the great robustness and efficiency of proposed algorithm.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络