节点文献

视频点播系统中的视频检索研究

Research on Video Retrieval for VOD System

【作者】 闫君飞

【导师】 吴刚;

【作者基本信息】 中国科学技术大学 , 网络传播系统与控制, 2008, 博士

【摘要】 视频点播(Video on Demand,VOD)是一种基于流媒体技术而实现的网络多媒体应用,多年来受到视频领域学者们的广泛关注。在点播系统中,用户希望以自适应的方式消费视频媒体,在任何时间、任何地点、以任意的方式消费视频媒体。基于内容的视频检索(content based video retrieval,CBVR)对于满足点播系统的用户需求有着突出的作用。然而,传统基于内容的视频检索方法存在着一定的缺陷,难以取得理想的效果。其中主要的问题是,以视频片段作为查询输入的查询方式难以普遍满足用户个性化的需求;检索系统中提取的视频特征相关性差,通用性不足,难以准确概括用户所需要的语义信息。图像分割是视频语义提取的一项关键技术,本文提出一种用于视频检索的图像分割方法。利用给定的示例图像感兴趣区域的色彩信息,估计待分割图像前景和背景的色彩统计模型,对每个像素计算其与前景/背景的相似性,结合目标前景的直方图匹配以及自然分割的对比度描述,得出图像分割框架进行优化求解。由于对待分割图像的每个像素进行了色彩似然性的刻画,所以能够克服光照变化、前景色块比例变化、前景尺寸变化等因素对分割准确性的影响。试验结果表明,与仅考虑直方图匹配的分割方法相比,本方法具有更好的普适性,能够有效地用于视频检索系统。在关键帧分割算法的基础上,提出一种用于视频点播系统的视频检索方法,用户提交单帧图片中的感兴趣部分作为查询输入,服务器据此对存储的视频摘要的关键帧数据进行分割,判断帧数据与查询图片的相似性,并将结果返回给用户选择播放。用户查询输入可以来自影片的海报宣传画、影片花絮镜头等等,由于仅对用户感兴趣的部分进行处理,所以对背景(非感兴趣区域)的色彩无须约束。试验结果表明此方法能够准确定位到用户期望的查询结果,在检索所需源视频的同时,检索到查询输入的相关同类视频,适用于视频点播系统。现有视频检索系统中对于视频的特征提取多采用低级特征(如颜色、纹理等),通用性不足,难以得到理想的的检索效果。本文提出一种基于尺度不变特征(scale-invariant feature transform,SIFT)的视频检索方法,用户提交单帧图片中的感兴趣部分作为查询输入,利用区域内部的尺度不变特征点,得到查询输入与目录服务器中视频摘要片断的数据帧之间特征点的匹配,在此基础上提出一种视频排序方法,对用户查询的视频数据进行定位和播放。试验结果表明该方法能够准确的找到查询输入的源视频与相关同类视频,不受前景目标旋转、尺度变换的影响,对光照条件的变化不敏感,可以有效应用于视频点播系统。

【Abstract】 Video on demand (VOD) is an important multimedia application based on streaming media technique. It attracts wide attention from the scholars in the field of video research. Users in video system want to consume video media in a self-adaptive way, and consume it anytime, anywhere and with any format. Content based video retrieval (CBVR) exerts outstanding effects in satisfying consumers’ demands in VOD system. Unfortunately, traditional CBVR methods have some shortcomings. The main shortcoming is that the query form of video segment can not satisfy the universal and individual demands of users, and the video character extracted from the retrieval system can not represent the semantic information of users because of its lack of relativity and universality. This problem cumbers CBVR to achieve ideal performance. In this dissertation, all proposed methods are to solve this problem.Image segmentation is a key technique of the extraction of video semantics. In chapter 3, a novel image segmentation method for video retrieval is proposed. In this method, we first use the color information of the interesting region in example image to estimate the color distribution model of the to-be-segmented image’s foreground and background. Then for each pixel, we estimate its similarity to the foreground and background. By integrating the pixel estimation with the histogram matching and contrast description of the objective foreground, we construct a novel graph-cut optimization framework. Compared with other algorithms, due to the depiction of each pixel’s color likelihood, our method is more robust for the variety of illumination and the alteration of scale or color proportion of foreground. Experiments show that our method is more effective than the traditional histogram matching algorithm, and more competent for video retrieval.In chapter 4, we first point out the shortcomings of the traditional video retrieval format of video segment. Then, based on the algorithm to segment key frame proposed in chapter 4, we propose a novel type for video-on-demand (VOD). The user uses the interested part of single image as the query. The system server, which stores the video summary in the whole system, segments the frames in video summaries according to the query, and computes the distance between the image and the query. Furthermore, the server can localize and play the video that the user requested. The query can come from the poster or from the frame itself, since it only handles the region of interest (ROI), no constraints are required for the color of the background (out of ROI). Experiments show that it is a novel and effective style for video retrieval which can localize the source video and the congeneric video of the query accurately. This method can be applied in the VOD system.Existing video retrieval system always extract low level character (such as color, texture etc.) of video system. It results in incompetent effect of retrieval system. In chapter 5, we proposed a video retrieval method using scale-invariant feature transform (SIFT) based on the query format of interested part of single image that proposed in chapter 4. The user uses the interested part of a single image as the query. The system server, which stores all the video summaries, uses the scale-invariant feature transform (SIFT) to match the input query with the ones in the frames of video summaries. Then, the server can localize and retrieve the video that the user requested. Experiments show that this method can localize the source video and the congeneric video of the query accurately which is invariant to foreground scaling and rotation, and partially illumination invariant. This method can be applied in VOD system effectively.

  • 【分类号】TN948.64
  • 【被引频次】4
  • 【下载频次】859
节点文献中: 

本文链接的文献网络图示:

本文的引文网络