节点文献

基于内容的视频检索关键技术研究

Research on Some Key Techniques of Video Retrieval Based on Content

【作者】 雷少帅

【导师】 谢刚;

【作者基本信息】 太原理工大学 , 电路与系统, 2012, 博士

【摘要】 随着视频信息广泛应用、数量迅速增加,如何对这些视频数据进行有效的组织和管理,已成为相当重要且富有挑战性的研究课题。由于视频物理结构的特殊性,传统的文本搜索方法不再适用。为将视频检索转化为文本检索,需对视频进行预处理。首先将视频分割为一个个独立的镜头,然后从每个镜头中选取若干关键帧,用关键帧来表示、代表镜头内容,最后将静态的视频帧标注为语义概念,从而将视频检索转化为对文本的操作本文围绕基于内容的视频检索中的几个关键技术进行了深入研究,具体包括镜头边界检测、关键帧提取、图像底层特征选取和图像语义识别。主要创新性工作包括如下几点:1、在镜头边界检测中,现有方法通过计算相邻两帧的差异确定镜头边界,但相邻帧的差异对闪光、物体和摄像机运动比较敏感,边界检测准确度低。本文提出了一种基于距离可分性的镜头边界检测方法,通过计算相邻两视频片段间的差异确定镜头边界,从而能有效抑制闪光、物体/摄像头运动,实现了闪光与切变,运动与渐变的有效区分2、现有关键帧提取方法缺乏对视频的时空分析,难以从整体上确定关键帧个数和关键帧位置。本文提出了两种关键帧提取方法,试图从视频时空特性的角度进行关键帧提取。第一种方法首先将镜头分割为若干个视频内容相似的子镜头,随后将每个子镜头的关键帧选取问题转化为矩阵的“最大线性无关组”求取问题,此方法可根据子镜头的内容变化快慢确定关键帧,以极低的冗余反映视频动态特性。第二种方法通过构造时空切片提取原视频的时空信息,然后通过聚类和规则定义实现关键帧的有效提取。两种关键帧提取方法均包含视频的时空特性,提取结果具有良好的人眼视觉感知。3、关键帧的作用包括构造视频摘要和提供视频片段索引。现有方法大都面向视频摘要的,其提取结果用于视频索引时,冗余过大,致使检索效率低下。本文提出了一种以视频索引为导向的关键帧提取方法,此方法通过研究摄像机运动方式和镜头表现手法实现关键帧的选取。首先构造了一种运动方向直方图,并以此为基础实现了摄像机全局运动的定性分析,最后结合全局运动的特点实现了关键帧提取。实验结果表明本方法能捕捉主要视频内容,为后期检索提供精简的索引结构。4、现有特征选择方法都没有把知识系统与分类紧密的联系在一起,不能够发现和推理各个数据特征间的关系,也不能有效地处理不一致、不完备信息,并从中发现隐含的知识,揭示出潜在的规律。本文把知识约简的思想应用到图像语义特征提取中,通过构建属性决策表,在知识不受影响的前提下对属性进行约简,可提取出有效底层特征集,为图像语义识别奠定基础。5、本文探讨了利用支持向量机(SVM)进行自然景观图像语义识别时的分类性能。由于SVM的分类性能由核函数及参数共同决定,因此本文分析了不同核函数和参数优化算法对自然景观图像语义分类性能的影响,最终在底层特征约简集的基础上利用优化后的SVM进行图像语义识别,取得了较高的识别准确率

【Abstract】 The video is a continuous time series of imageframes, and is a image stream without data structure. If the video is seen a book without catalog and index, then an image frame is equivalent to one page of the book. Due to the lackness of catalog, people can not efficiently browse and retrieve. With the extensive application of video information, and the dramatic increase of vdieos, effective organization and management have been considered a very important and challenging research topic. This dissertation focuses on in-depth study several key technologies in the video retrieval based on content, Including shot boundary detection, key-frame extraction, image low-level features selection and image semantic recognition. The main innovations are summarized as follows:In shot boundary detection, existing methods calculate differences between two adjacent frames to get the shot borders, but the difference of adjacent frames is more sensitive to flash, object and camera motion. This dissertation presents a shot detection method based on distance separability criterion, by calculating the difference between two video clips in sliding window to determine shot borders, which can effectively suppress the flash and the object/camera motion. This method can effectively distinguish the flash and cut transition, the object/camera motion and gradual transition.Existing key frame extraction is lack of spatio-temporal analysis of a video. They are difficult to identify the number and the locations of the key frames as a whole. The first method first splits a shot into several sub-shots with similar visual content, followed by spatial and temporal analysis to identify key frames according to the change rate of video content. This method can effectively reflect the dynamic characteristics of the video. The second method first constructs a space-time slice to extract the spatio-temporal information of the original video, and then use K-mean clustering and some rules to achieve the effective extraction of key frames. The two methods contain spatio-temporal analysis, and the extraction result is consistent with human visual perception.The role of key frames includes constructing video summary and providing index of video clips. The existing methods are actually oriented to video summary, when the result is used for video indexing, the result is too redundant, resulting in low retrieval efficienc. This dissertation presents the idea that different extraction strategies should be applied to different applications, camera movementand lens performance techniques to extract key frames for videoindex. This dissertation presents a hierarchical camera motion classification algorithmon based on motion direction histogram, followed the basis of camera motion qualitative analysis, lens performance techniques were applied into the effective extraction of key frames. The experimental results show that the method can capture the main video information to provide a concise index structure for later retrieval.Existing feature extraction methods can not make a connection of knowledge systems and classification, and can not find and reason the relationship between the individual data, also can not effectively deal with inconsistent, incomplete information to find the implied knowledge, to reveal the potential laws. In this dissertation, knowledge reduction based on rough set theory is applied to the image semantic feature extraction. Under the premise of knowledge is not affected, by constructing attributes decision table and reducing the attributes, the effective low-level feature set can be extracted to lay the foundation for image semantic recognition.This dissertation investigates the classification performance of support vector machine (SVM) in the semantic recognition of landscape images. SVM classification performance is determined by the kernel function and parameters, therefore this dissertation analyzes the impacts of the different kernel functions and parameter optimization algorithms for semantic recognization performance of landscape images, and ultimately on the basis of effective low-level features set, optimized SVM was used to i to obtain a higher recognition accuracy.

节点文献中: