

Research on Content-based Multimedia Visual Information Retrieval

【作者】 赵英海

【导师】 吴秀清;

【作者基本信息】 中国科学技术大学 , 信号与信息处理, 2010, 博士

【摘要】 近年来,随着电子技术、多媒体技术的发展,多媒体视觉内容信息作为一种直观形象、更具吸引力的知识表达形式产生着越来越重要的影响。与此同时,互联网技术、大容量数据存储技术的进步有效促进了多媒体视觉内容信息的存储及传播。面对如此丰富的视觉内容信息,如何实现合理地、有效地组织、表达及搜索,已成为现阶段信息检索领域研究的热点问题。本文从基于内容的多媒体视觉信息搜索的总体框架出发,以视觉内容分析为主线,分别从视觉内容多标注语义概念检测、视觉内容特定概念检测、视觉内容标注语义相关度排序、交互式颜色结构搜索四个方面展开研究。本论文的主要研究工作和创新点如下:1.针对多媒体视觉内容多标注概念检测问题,提出了一种基于稀疏图结构的转导半监督学习方法。传统方法假设多个语义概念间相互独立,忽略了语义概念之间的相关性信息。本文方法利用信号稀疏化表达原理挖掘样本间的视觉相似性关系以及概念间的分布相关性关系,并通过隐马尔可夫随机场模型将概念间分布相关性与半监督学习一致性假设有机结合在一起,完成多标注转导半监督学习。算法在克服训练样本缺乏问题的同时,通过稀疏化方法合理挖掘了概念间的相关性,改进了标注表现并降低了模型复杂度。算法在TRECVID 2005数据集上与6种相关算法进行了比较,实验结果验证了本文算法的有效性。2.针对视觉内容中特定概念(船只目标)的检测应用,提出一种基于灰度标准差平面局部Contrast-Box滤波的可见光遥感图像中船只目标检测方法。选用局部灰度统计标准差作为检测特征实现了对黑-白两种极性船只目标的统一描述,并消除了海面背景平均亮度变化的影响,同时有效降低了问题的规模。选用Contrast-Box局部自适应滤波在检测特征平面上完成候选目标定位,并利用了目标的空间结构信息,有效克服云、海浪、船只尾迹的影响。3.针对网络共享多媒体视觉内容的噪声标注信息,提出一种基于视觉内容语义相关度的标签排序方法。算法基于贝叶斯理论给出标签与视觉内容语义相关度定义的概率描述,同时考虑了标签本身的视觉信息语义相关先验概率和标签与特定视觉内容语义相关的似然概率。针对不同底层特征在表达不同的语义内容时的语义鸿沟状况,融合全局与局部视觉特征实现对不同语义的标签与视觉内容间相关度概率的准确估计。算法具有无监督特性,能够自动挖掘网络数据完成运算,不需要事先提供训练样本以及额外的训练过程。算法在较大规模的Flickr数据集上进行了实验,实验结果表明本文方法能够对与视觉内容相关的标签和与上下文信息相关的标签实现正确地区分。4.在基于关键字的多媒体视觉内容搜索模式的基础上,提出了一种面向颜色结构信息挖掘的交互式视觉内容搜索技术。通过两种搜索模式的结合帮助用户搜索获取不仅语义上相关而且满足用户颜色结构搜索意向的视觉内容结果。颜色结构信息通过新设计的二进制形式的特征进行表达。特征具有很小的空间存储需求。颜色结构一致性定义考虑了不同感兴趣颜色间的绝对、相对以及上下文空间分布一致性关系。一致性计算过程可通过按位比特运算在线完成。在交互界面方面,本文提供了多种灵活的颜色选择、空间表达交互方式。算法基于网络图像搜索数据进行了实验,从参数设置、时间空间复杂度、相关算法性能比较、用户调查等多个方面进行了细致地分析与验证。

【Abstract】 With the great advances in electronic and multimedia techniques, multimedia visual content information which acts a vivid and interesting knowledge representa-tion modality has more and more influence in recent years. Meanwhile, the develop-ment of Internet and large-scale data storage techniques accelerate the storage and propagation of multimedia visual content information furthermore. How to organize, represent, and retrieval these gigantic volume of visual content information has been a focus problem in modern information retrieval community.Concentrating on the overall framework of content-based multimedia visual con-tent retrieval, in this thesis, we research into four aspects of visual content analysis: visual content multi-label concept detection, visual content specific concept detection, visual content label ranking based on semantic relevance and interactive color layout retrieval. The main contributions are illustrated as follows:1. For the problem of visual content multi-label concept detection, we propose a sparse graph based transductive semi-supervised learning method. Conventional methods assume that the concepts happen independently, hence neglect the cor-relation among multiple concepts. We exploit the sparse signal representation theory to mine the visual similarity among instanes and the distribution correla-tion amone concepts. Then, the concept correlation attributes and the consistency assumptions of semi-supervised learning are integrated together under the hidden Markov random field framework. The semi-supervised learning could overcome the problem of lacking of training data, and the sparse techniques catch the con-cept correlation more reasonably and efficiently, which improve the annotation performance and reduce the model complexity. Our method is evaluated on the TRECVID 2005 dataset, and conducts extensive comparative experiments with respect to 6 related methods.2. For the problem of visual content specific concept (ship) detection, we propose a ship detection scheme based on Contrast-Box filtering on the 2-D feature plane constructed with local intensity standard deviation. Taking the intensity standard deviation as detection feature could reach a consistent characterization for ships of both white and black polarity, and remove the brightness variances of sea background, meanwhile, reduce the problem to a reasonable scale. The Con- trast-Box filtering process could detect the target candidates on the detection fea-ture plane self-adaptively by exploiting the spatial structure information of the targets, and remove the false alarms caused by clouds, waves and ship tracks.3. For the noisy social-tagging results of the community-contributed multimedia visual contents, we propose a tag ranking algorithm based on visual content se-mantic relatedness. In this algorithm, the definition of semantic relatedness be-tween tag and visual content is formulated in probability based on Bayes’theo-rem, taking account to both the prior visual information related probability of tags and the relatedness likelihood between tag and specific visual content. Morever, because different visual features have different semantic gap size when representing different semantic contents, global and local features are fused to conduct the probability estimation more accurately. The proposed method is of semi-supervised in nature, and fullfills based on the internet data, and does not need any training data and the time cost of model training. This method is eva-luated on a large scale Flickr image dataset and the experimental results demon-strate that the visual content related tags could be distinguished from the contex-tual tags effectively.4. As a powerful supplement of keyword-based visual content retrieval scheme, we propose an interactive multimedia search scheme based on visual content color layout to help users get search results which are not only related in semantic but also in consistency of color layout. The color layout information is characterized by a novel feature in binary format which is compact in storage. The consistency definition between color layouts simultaneously considers the absolute, relative and contextual spatial distribution consistency of the colors. The consistency computation could be completed online through bit-wise operations. Moreover, a convenient interactive interface is presented to allow users to specify interest col-or layout flexibly. Extensive experiments are conducted on internet image search results to evaluate the proposed approach in every aspect, such as parameter sen-sitivity, time-space complexity, performance comparison, and user study.
