节点文献

视频广告内容分析与理解

Content Analysis and Understanding of Video Commercial

【作者】 刘楠

【导师】 赵耀;

【作者基本信息】 北京交通大学 , 信号与信息处理, 2012, 博士

【摘要】 视频广告已经成为当今人类社会最为流行的一种商业媒介,为人们的现代生活带来了不可或缺的商业信息,时时刻刻都在潜移默化的影响着人们的工作和生活方式。每年世界各国的企业都会花费上亿美元的资金,生产、投放成千上万条的视频广告并在各国电视台循环不断的播出,在向民众推介各种各样新颖的商品及服务的同时,也带动了相关产业的迅速发展。同时,随着数字化浪潮的发展,人们已经可以通过各种手段录制海量的视频广告以便随时获取重要的商业咨询。但是,由于缺少有效的广告内容自动分析技术,录制广告数量的爆炸性增长导致了不同用户群对于视频广告自动滤除、采集以及索引等多方面的迫切需求。如何针对他们各具特色的需求,开发一系列有效的视频广告内容分析与理解技术,从而快速有效的监视、分析、存储、查询视频广告的内容、播出时间、质量等已经成为当前多媒体内容分析领域的一个热点问题。针对当前视频广告内容分析与理解技术中存在的不足,本文尝试和探索从视频广告各种潜在语义特性的分析入手,借助计算机视觉、机器学习以及多媒体处理技术,跨媒体挖掘视频广告中存在的各种语义概念,构造中层描述子,实现不同媒体模态下的信息交互融合,提出行之有效的解决方案。本文主要成果和创新之处包括以下几个方面:1)视频广告识别技术中的由粗及细匹配策略为提高视频广告识别技术的效率,通过将局部敏感哈希函数(Locality Sensitive Hash, LSH)与精细尺度连续滤除技术(Fine Granularity Successive Elimination, FGSE)进行有机的融合,提出一种由粗及细的匹配策略。在粗匹配阶段,利用LSH加快初始检索速度,大量过滤无关内容,得到全局近似的查询结果;在精匹配阶段,引入FGSE技术解决粗匹配过程中的冲突问题,通过逐层分解匹配特征,快速寻找局部差异,获得精确匹配结果,实现对于视频广告的快速识别。2)基于协同学习的视频广告文本检测视频广告中的文本是一种重要的语义信息。为实现对这类复杂文本的有效定位,提出一种基于协同学习的视频广告文本检测方法。通过将文本检测视为一种特殊纹理的分类问题,引入协同学习机制,采用两种相对独立的视角强化对于文本区域特性的描述。针对协同学习机制中容易引入噪声样本的问题,结合Bootstrap思想,提出一种改进的协同学习算法,在两个相对独立的视角中交互选择典型样本,提高分类器的泛化能力。3)融合视觉-音频-文本模态信息的视频广告段落检测提出一种基于视觉、音频以及文本模态交互式融合的广告段落检测方法。通过充分挖掘广告各模态中本质的播放特性,首次在视频广告的文本模态中,结合视频文本区域在时空域随机变化的特性,提出一个全面的文本描述子,同广告音视频模态特征构成完整的描述空间。此外,为解决现有融合方式中,简单将各模态信息进行叠加的不足,提出一种交互集成式学习算法Tri-AdaBoost,交互挖掘各种模态的中级描述子所蕴含的互补信息,实现这些模态的有机融合,从而提升分类器的性能。4)视频广告段落分割技术中跨媒体特性分析与融合研究通过融合广告视觉、音频以及文本模态,提出一种有效的视频广告段落分割方法。为了加强对于广告分割具有重要作用的描述子——产品信息帧(Frame Marked with Product Information, FMPI)检测的鲁棒性,首次将文本模态与一些重要的视觉特性引入FMPI的构造过程,并结合音频模态描述子形成一个对于广告边界特性的完整描述空间。此外,利用不同模态下描述子之间时域的上下文信息,实现各模态的有效融合,自动分割广告段落。5)基于稀疏视觉词包描述的广告语义分类方法为提高传统视觉词包的描述能力,利用更加符合人类理解图像方式的稀疏学习技术,提出一种基于稀疏视觉词包描述的广告语义分类方法。基于对于大量广告中视觉语义单元共生规律之间的分析,将不同类别广告中出现的各具特色的语义单元映射为一种过完备化的视觉词典表示,并利用这个词典中基本元素的稀疏线性组合描述广告中潜在的语义,在不同类别广告中蕴含的语义信息与稀疏视觉词包描述之间建立潜在的映射关系,实现对于广告语义内容的分类。

【Abstract】 As one of the most popular means of promoting products, video commercials have become an inescapable part of modern life, significantly influencing our work habits and other aspects of life. Due to the importance of video commercials, tens of thousands of commercials are produced and broadcasted on many TV channels to promote a variety of new commodities or services, costing billions of dollars.Meanwhile, benefiting from the rapid development of digital technologies, people can conveniently record more and more commercials for commercial information acquisition. However, the explosive growth of recorded commercials results in critical demands for the actual applications (e.g. commercial filtering, capturing, and indexing) of a smart commercial content analysis and understanding (CCAU) scheme for different user groups. It is deeply desirable to design an effective CCAU scheme to assist them in monitoring, browsing, and indexing daily updated commercials. This kind of research has become an intense focus in multimedia analysis.To alleviate the challenges of CCAU, some key issues in CCAU are explored by a series of state-of-art computer vision, machine learning, and multimedia processing technologies. Specially, we propose a variety of mid-level descriptors to describe the intrinsic commercial semantics from different modalities. In addition, aiming at collaboratively exploiting these cross-media characteristics, some effective techniques are well designed to boost the performance of the proposed CCAU methods. The following points highlight several contributions of this paper:1) Video commercial recognition using coarse-to-fine matching strategyAiming at improving the efficiency of video commercial recognition, a coarse-to-fine matching strategy is proposed resorting to the effective combination of locality sensitive hash (LSH) and the fine granularity successive elimination (FGSE). Specially, LSH is applied to accelerate the initial coarse retrieval procedure and FGSE is evolved into the means to eliminate rapidly those irrelevant candidates which have passed the coarse matching process.2) An enhanced co-training based video commercial text detection To pave the way for utilizing the video textual characteristics in commercials, we present an enhanced co-training based commercial text detection approach by interactively exploiting the intrinsic correlation of multiple texture representation spaces. Specially, to alleviate the problem of noise samples in co-training process, an enhanced co-training strategy combining with Bootstrap is proposed for improving the generalization ability of the classifier.3) Collaboratively exploiting visual-audio-textual characteristics for video commercial block detectionWe focus our research on commercial block detection by the means of collaborative exploitation of visual-audio-textual characteristics embedded in commercials. Rather than utilizing exclusively visual-audio characteristics like most previous works, some intrinsic textual characteristics associated with commercials but rarely presented in general programs are fully exploited via analyzing the spatio-temporal properties of overlay texts in commercials. Additionally, Tri-AdaBoost, an interactive ensemble learning manner is proposed to form a consolidated semantic fusion across visual, audio, and textual characteristics.4) Video commercial block segmentation based on the collaborative fusion of visual-audio-textual descriptorsAn effective commercial block segmentation method has been proposed by collaboratively fusing the visual-audio-textual descriptors. Additional informative descriptors including textual characteristics are introduced to boost the robustness in the detection of frame marked with product information (FMPI). Together with the audio characteristics, FMPI can provide a kind of complementary representation architecture to model the similarity of intra-commercial and the dissimilarity of inter-commercial. In addition, the relation among these multi-modal descriptors in temporal domain is further collaboratively utilized to segment commercial block into multiple individual commercials.5) Video commercial categorization using sparse coding based visual bag of words representationTo boost the discrimination ability of the traditional visual bag of words (VBoW) in commercial categorization, a more suitable representation method, i.e. sparse coding based VBoW, is presented to describe the co-occurrence of semantic units in different kinds of commercials. These semantic units are mapped into an over-completed dictionary and each commercial is further represented by the sparse liner combination of these atoms in the dictionary.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络