节点文献

多媒体语义提取方法及其在视频水印中的应用研究

Research on Extraction of Multimedia Semantics and Its Application in Video Watermarking

【作者】 秦可臻

【导师】 同鸣;

【作者基本信息】 西安电子科技大学 , 信号与信息处理, 2010, 硕士

【摘要】 随着计算机和网络技术的飞速发展,视频和图像等多媒体数据呈几何级数增长,同时人们对这些视觉媒体内容的需求也越来越多,越来越广泛,因此如何从浩如烟海的数据资源中实现信息检索逐渐成为目前的研究热点。但是现有的检索技术多是基于底层视觉特征的检索,与人们所能理解的高层语义概念相去甚远,这严重地影响检索的实际效果。多媒体数据所包含的语义内容无法用底层视觉特征来准确表述,即在底层视觉特征和包含的语义之间存在着“语义鸿沟(Semantic Gap)”,如何跨越“语义鸿沟”,有效的提取语义信息,已经成为多媒体研究领域中一个亟待解决的问题。首先,论文阐述了基于内容的信息检索技术(Content Based Information Retrieval,CBIR)的研究和发展现状,介绍了语义提取研究的相关理论及当前常用的语义提取方法,包括基于机器学习的语义提取方法、基于反馈学习的语义提取方法和结合特定领域的语义提取方法等。论文研究并实现了两种典型的基于机器学习的图像语义提取方法,包括基于支持向量机(Support Vector Machine,SVM)的语义提取和基于一致语言模型(Coherent Language Model,CLM)的语义提取。实验结果表明,这两种方法对图像有较好的语义提取效果。其次,论文提出了一种基于模糊关联分类的视频语义提取方法,该方法引入模糊概念,解决了关联规则挖掘“边界过硬”问题;把关联分类规则挖掘看作约束优化问题,通过构造自适应惩罚亲和度函数,以提高评估抗体优劣程度的准确性;采用混合双变异算子,以获得更好的全局和局部搜索能力;采用老化算子,在保证种群多样性的同时减小了计算复杂度。论文将该方法用于视频运动语义和纹理语义提取,取得了令人满意的实验结果。最后,论文将高层语义应用到视频数字水印中,提出了一种基于视频语义的AVS(Audio Video coding standard)压缩域鲁棒水印方法,该方法利用获得的视频运动语义,在线生成动态语义水印;根据运动语义自适应确定感兴趣镜头,根据纹理语义自适应确定感兴趣I帧,根据人眼视觉掩蔽特性,选择运动剧烈和运动缓慢区域作为感兴趣区域,将水印嵌入在感兴趣I帧的亮度子块预测残差DCT中频系数上;利用视频纹理特征,自适应控制水印嵌入强度。实验和分析表明,该方法不仅对各种常规攻击鲁棒,而且对帧重组、帧内裁剪和帧删除等视频特有攻击表现出强的鲁棒性。论文最后对本文工作进行了总结,并提出了下一步研究探索的方向。

【Abstract】 With rapid development of computer and network technology, video and image multimedia data are into a geometric growth. And an urgent demand has arisen for this multimedia information. How to retrieval useful data from abroad array of resources in the multimedia data is becoming the current research focus. But most information retrieval techniques are based on low-level features, which are quite different from the semantic concepts in human thought, affecting the retrieval results inevitably. The low-level features cannot describe the semantic content of multimedia data exactly. That is, there are“semantic gap”between low-level features and high-level semantics. How to solve the“semantic gap”, extract semantics information effectively has become a serious problem in multimedia research.Firstly, the paper describes the development and the state of the content-based information retrieval (CBIR) research, introduces the relevant theory of the semantics and the current common methods of the semantics extraction, including semantic extraction methods based on machine learning, feedback learning and domain-specific. The paper researches and implements two kinds of typical machine learning based methods of image, which are semantics extraction based on Support Vector Machine (SVM) and semantics extraction based on Coherent Language Model (CLM). The experimental results showed that the two methods have better performance on image semantics extraction.Secondly, the paper proposes a method of video semantic extraction based on fuzzy associative classification, which introduces fuzzy concept to solve the“boundary tough”issues of association rule mining. This method considers the associative classification rule mining as a constrained optimization problem, improves the accuracy of assessment of antibody by constructing adaptive penalty affinity function. The proposed method adopts a mixed pairs of mutation operator to obtain better global and local search capabilities, and adopts an aging operator to ensure the population diversity and reduce the computational complexity at the same time. The paper applies this method to extract video motion semantics and texture semantics, obtaining satisfactory results.Finally, the paper combines the watermark with video semantics, proposes an Audio Video coding standard (AVS) compressed domain robust watermarking method based on video semantics. By use of video semantics extracted, the method generates dynamic semantics watermark online. This method determines the shots of interest according to motion semantic adaptively, determines the I frames of interest according to texture semantic adaptively, and selects the intense movement and slow movement of the regional as a region of interest according to human visual masking properties, then embeds watermark in the IF DCT coefficients of the luminance sub-block prediction residual of the I frames of interest. This method adaptively controls watermark embedding strength by use of video texture features. Experiments and analysis show that this method is robust not only to various conventional attacks, but also to re-frame, frame cropping, frame deletion and other video-specific attacks.In the end, the summary of this paper is given and the future direction of the research is presented.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络