节点文献

视频文本的提取

Text Extraction in Video

【作者】 章东平

【导师】 刘济林;

【作者基本信息】 浙江大学 , 通信与信息系统, 2006, 博士

【摘要】 视频中的文本能够给视频检索和索引提供重要的辅助信息,有时视频中的文本包含了其它地方没有的信息,例如电影片头的字幕,有时,视频中的文本是一种重要而简洁的辅助信息,例如体育比赛中的得分股票价格。如果视频中的文本能够被有效地提取和识别,那么许多高层次的应用,例如视频摘要,就可以更好地实现。 论文对视频文本提取的几个方面,包括文本定位、文本跟踪、文本增强和文本分割进行了研究。与文档图像相比较,视频中的文本提取由于其较低的分辨率、复杂的背景、照明的变化、和位置、形状与颜色的不确定而具有很大的挑战性。 本文采用了一种压缩域与空域相结合的文本行定位方法,文本区域使用DCT块的纹理能量直接在DCT域检测,文本行根据文本区域差分图像的水平投影轮廓线来提取。 提出了一种基于M估计模板匹配的文本跟踪方法,匹配模板用LLT(Logical Level Technique)对文本区域进行粗分割得到,搜索窗口位置用MPEG-2比特流中的运动向量来估计,模板匹配的加速采用基于优胜者更新的多分辨率方法。 一种多帧融合的增强方法被用来提高文本与背景的对比度,论文根据文本区域中每个象素在时间域上的强度分布决定采用多帧平均方法还是采用多帧最小或多帧最大方法来增强文本区域。 提出了一种基于彩色笔画模型的文本分割算法,彩色笔画模型描述了字符在彩色空间的局部地形学特征,文本分割算法由文本区域二值化和连通区域二部分组成。

【Abstract】 Text in digital video can provide important supplemental information for retrieval and indexing. There are cases where text in a clip contains information that is not found anywhere else such as movie credits, and other cases where text is an important concise supplement, such as sports scores or stock prices. Many high-level applications such as video abstract become possible if text in digital video can be extracted and recognized robustly.This dissertation presents our work on several aspects of text extracting in digital video, including text localization, tracking, enhancement and segmentation. Compared with typical document images text in video presents challenges because of low resolution, complex background, lighting variation, and unrestricted pose, shape and color.A method to automatically localize texts in the compressed domain and spatial domain is presented. The text regions are detected directed in DCT domain using the texture energy of each DCT block. A horizontal projection profiled of differential image of text region is employed in text line extraction.The tracking algorithm makes use of template matching with M-estimator. The matching template is acquired by segmenting the text region using logical level technique. The location of search window is estimated by using the motion vectors in the MPEG-2 bitstream. Multi-resolution method based on the winner-update strategy is adopted to speed up the template matching.An enhancement algorithm by multi-frame integration is used to increase the contrast between text and background. We decide to adopt multi-frame averaging method or multi-frame minimizing/maximizing method to enhance the text region by the analyzing the intensity distributing of each pixel over time.A text segmentation algorithm based on color stroke model is proposed. The color stroke model depicts the local topographical feature of characters in color space. The algorithm combines the binarization of text region and connected components analysis.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2006年 09期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络