节点文献

视频信息内容管理关键问题研究

Research on Key Issues in Video Information Content Management

【作者】 李岳楠

【导师】 陆哲明;

【作者基本信息】 哈尔滨工业大学 , 信息与通信工程, 2010, 博士

【摘要】 随着网络通讯和多媒体技术的迅猛发展,视频信息近年来呈现出爆炸式的增长态势。相应地,以视频信息为中心的应用也层出不穷,如网络电视,3G视频通信、视频点播和视频分享等。由此引发了视频信息获取和传播方式的深刻变革——传统意义上单一、被动的信息获取模式正在被多元化、互动式的媒体交互业务所取代。与此同时,视频信息数量的膨胀和应用模式的扩展也逐渐显现出诸多技术和社会问题。一方面,人们对视频信息组织、利用、版权管理和内容认证等需求日益增强。另一方面,传统的索引、检索和信息安全等技术又难以直接应用于视频信息。因此,如何针对视频信息的特性,研究完善、高效的视频信息内容管理机制已经成为当前学术界和多媒体产业界所广泛关注的热点问题。本文从视频信息的基本特性出发,围绕视频信息应用过程中所显现出的需求,对视频信息管理中的关键问题进行研究。本文的研究工作旨在通过设计有效的内容管理机制来提高视频信息的可利用性以及可信任性。研究内容主要涉及视频结构解析、视频摘要、视频内容识别以及视频内容认证。本文的研究工作和创新点在于:(1)提出一种快速的镜头边界检测通用框架,以解决现有镜头边界检测算法运算复杂度高的弊端。本文工作并不拘泥于特定的切变或渐变检测算法,而是致力于提出一种能够提高镜头检测效率、并具有普遍适用性的通用框架。该框架采用多项预处理技术初步剔除非镜头区域并预测镜头边界的属性。另一方面,提出一种并行于视频编码的快速镜头检测算法。算法通过有效地利用视频编码过程中产生的边带信息来辅助镜头检测。仿真实验表明,本文算法在显著提高镜头检测效率的同时,还可以达到理想的检测准确度。(2)提出基于视觉注意力模型和在线聚类的视频摘要算法。在详尽分析注意力形成过程的视神经生理学机制的基础上,将注意力模型引入关键帧提取过程。通过模拟视觉系统各功能单元在注意力形成过程中的作用机理来自动检测帧内的关键目标,并以此作为关键帧提取的依据。为了保证关键帧的简洁性、降低存储需求并实现实时的摘要显示,算法提出针对感兴趣区域特征的在线聚类方案。仿真实验表明,本文算法具有内容自适应性,所提取的关键帧集合在一定程度上能够较好地与主观观察结果相吻合。(3)提出基于时空域显著点的视频识别算法,以Harris显著点检测器和运动轨迹跟踪技术为基础,对显著点的空域显著性和时域稳定性进行衡量,最终选取最稳定的时空域显著点作为视频识别特征。算法将Hausdorff距离引入特征匹配,以应对显著点的无序性。此外,本文还提出基于非负矩阵分解的视频识别算法,推导了Euclidean范数准则下的非负矩阵分解算法。在此基础上,利用非负矩阵分解提取能够综合反映视频信息时空内容本质的基图像,以基图像作为视频识别的切入点。实验结果表明,本文提出的两种算法可以实现精确的视频识别,性能优于同比算法。此外,时空域显著点可以有效地抵御几何失真对视频识别的影响。(4)对鲁棒哈希函数在视频内容认证中的应用进行了研究,阐述了哈希函数概念和应用领域的拓展。提出了基于随机Gabor滤波和抖动格型矢量量化的鲁棒哈希函数。通过构造具有旋转不变性的Gabor滤波器来增强鲁棒哈希对旋转操作的抵御能力。为了保证特征提取的安全性,提出依赖于密钥的随机Gabor滤波方案,并探讨了鲁棒哈希函数中安全性和随机性之间关系。针对现有量化器的局限性,算法提出基于抖动格型矢量量化的量化方案,并通过理论分析和实验验证对该量化方案的有效性进行论述。实验和分析结果表明,本文算法在鲁棒性和区分性方面都有良好的表现,尤其是在对旋转操作的鲁棒性方面明显优于代表性算法。此外,针对视频信息的特性,提出一种基于视频时空域能量关系的鲁棒哈希函数。算法借助随机像素块划分和三维信号变换提取视频内不同区域的能量关系。相比于现有的视频鲁棒哈希函数,本文算法在鲁棒性方面的性能有显著提升。此外,分析结果显示算法的特征提取环节具有较高的随机性。

【Abstract】 With the rapid developments of network communication and multimedia technologies, there has been an explosive growth on the amount of video data. At the same time, video oriented applications keep emerging in recent years, such as Internet TV, 3G video communication, video on demand (VOD) and video sharing. Consequently, the vast amount of video information and the extension of application scenarios have lead to significant changes on the ways of video acquisition, utilization and distribution. The conventional monotonous and passive video acquisition modes are being replaced by diverse and interactive multimedia services. Meanwhile, the ever increasing video information also results in a series of technological and social problems. There has been a strong demand of video organizing, utilizing, copyright management and content authentication techniques. However, the conventional indexing, retrieval and information security techniques cannot be simply extended to video information. Therefore, developing efficient and effective video information management techniques has become a major topic of interest in both academia and the multimedia industry.Taking the characteristics of video information as the point of departure, this dissertation addresses the technical issues arising from video applications. The principal goal of this dissertation is to design effective content management schemes to enhance the availability and reliability of video information. The research work of this dissertation focuses on video structure parsing, video abstraction, video identification and content authentication.The main work and contributions of this dissertation are as follows:A fast shot boundary detection framework that employs pre-processing techniques is proposed. The motivation of our work is not to design a specific hard cut or gradual transition (GT) detection method. Instead, we concentrate on a fast shot boundary detection framework that can enhance the efficiency of shot boundary detection. Several pre-processing techniques are incorporated in the framework to eliminate non-boundary frames and predict the attributes of potential shot boundaries. Moreover, we also propose a fast shot boundary detection paradigm that is parallel with video coding. The side information generated by video encoder is exploited to facilitate shot boundary detection. As a result, the detector can get rid of the computationally intensive feature extraction procedure. Experimental results indicate that both of the proposed works can effectively improve the efficiency of shot boundary detection, while the detection accuracy can be maintained at a satisfactory level.In order to facilitate video browsing, an attention model and on-line clustering based video abstraction algorithm is developed. We first investigate the visual neuron-physiology mechanisms of human attention, based on which the visual attention model is employed in video abstraction. Region of interests (ROI) are detected in each representative frame by simulating the functions of human visual system components in forming attention. In order to reduce the consumption of memory and achieve on-the-fly key frame representation, an on-line clustering scheme is proposed. It is revealed in simulation that the proposed key frame extraction algorithm is content adaptive, and the extracted key frames are well consistent with the results of human perceptions.We also present a spatial-temporal salient points based video identification algorithm. The spatial-temporal salient points are detected with the aid of the Harris detector and trajectory tracking techniques. The stability of each salient point is evaluated from both spatial and temporal aspects, and those with the highest spatial saliency and temporal stability are selected as the feature for video identification. In order to cope with the arbitrary order of salient points, the Hausdorff distance is employed as the metric for feature comparison. In addition, a non-negative matrix factorization (NMF) based video identification algorithm is proposed. The updating function of NMF under the Euclidean norm criterion is derived in this work. Consequently, NMF is performed on the input video to obtain the basis images that can represent the spatial-temporal content essence of the input video. Video sequences are identified via the features of basis images. It is demonstrated that the proposed algorithms can achieve accurate video identifications, and their performances are superior to that of the state-of-the-art algorithm. Especially, the spatial-temporal salient points can effectively resist geometrical distortions.Also, the application of robust hashing in content authentication is investigated in this work. Firstly, the extension of the concept of hash function from generic data to multimedia data is elaborated. We propose a random Gabor filtering and dithered lattice vector quantization (DLVQ) based robust hash function. In order to enhance the robustness against rotation manipulations, the conventional Gabor filter is adapted to be rotation invariant. Consequently, a key dependent random filtering scheme is developed to facilitate secure feature extraction. The relationship between the security and randomness of robust hash function is investigated. Consider the limitations of existing quantization schemes, a DLVQ based quantization scheme is developed. The efficiency of the DLVQ based quantization scheme is illustrated by analytical and experimental results. It has been revealed that the proposed robust hashing performs outstandingly well on robustness and discrimination. Especially, it shows significant advantages over state-of-the-art algorithms on the robustness against rotation manipulations. In addition, a spatial-temporal energy based video hashing algorithm is developed. The energy relationships between different regions are calculated using three dimensional signal transform and random block partition. The proposed work outperforms existing works in terms of robustness. In addition, analytical results show that the proposed video hashing algorithm can exhibit a high amount of randomness.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络