节点文献

面向感知的图像检索及自动标注算法研究

Research on Perception Oriented Image Retrieval and Automatic Image Annotation

【作者】 冯松鹤

【导师】 须德;

【作者基本信息】 北京交通大学 , 计算机应用技术, 2009, 博士

【摘要】 随着多媒体技术和计算机网络技术的发展,人们接触到的图像数据以前所未有的速度增长,面对海量的图像资源,用来有效地分析、组织和管理图像数据的基于内容的图像检索系统成为多媒体技术的研究热点。研究的难点在于如何使计算机能够从人的认知角度来理解图像语义信息,最大程度地弥合图像低层特征和高层语义之间的语义鸿沟。论文工作的前半部分主要研究面向用户感知的图像检索算法,着重讨论如何从图像中提取出符合用户感知的语义内容,以及如何有效地融入用户的高层语义来改进图像检索的性能。论文工作的后半部分主要研究自动图像标注算法,着重讨论如何建立有效的机器学习模型来描述并解决标注问题,以及如何改进训练样本的有效性以改善图像标注的性能。在基于区域的图像检索上,准确地提取出图像中用户感兴趣的部分是解决图像检索问题的关键。针对图像中只存在部分区域符合用户检索意图这一歧义性问题,提出一种完全数据驱动的,基于选择视觉注意力机制的图像检索算法。该算法首先利用注意力模型生成显著图,通过与边缘图及分割图的结合自动地选取图像中的显著边缘和显著区域,并以此来表征用户的检索意图,然后采用有效的视觉特征描述图像的显著信息,最后通过特征融合策略实现了基于语义的图像检索。在基于机器学习理论与相关反馈机制相结合的图像检索上,着重研究了基于图的半监督学习算法在区域图像检索中的应用。在无用户反馈以及用户只反馈正例图像的情形下,将图像检索问题转化为直推式学习问题,构建融合区域显著性信息的层次化图模型,利用流形排序算法实现标记传播:在用户同时反馈正反例图像的情形下,利用反馈得到的正反例图像构建区域级相似性邻接矩阵,通过图学习算法进行迭代计算,选择出符合用户查询语义的区域级特征向量集合,并利用该特征向量集合实现区域级的图像检索。在自动图像标注算法上,通过分析自动图像标注中存在的输入空间和输出空间的歧义性,提出一种基于半监督学习框架下的多示例多标记学习算法实现图像自动标注。该算法首先提出一种改进的多样性密度求解方法,用以衡量训练正包和未标记包中各个示例与给定关键词的语义相似性;然后选取多样性密度值满足预定条件的示例作为表征给定语义关键词的示例原型,采用高斯混合模型对其进行语义建模;在此基础上,提出一种有效的特征映射策略,用以将正包集合和未标记包集合进行重定义;最后采用一种基于图的半监督学习算法来完成给定语义关键词的标记传播。在图像标注性能改善上,针对现有的标注算法大多没有考虑到训练样本表征关键词典型程度的问题,提出了一种基于核密度估计思想的样本表征关键词的置信权值计算方法,通过[0-1]间的实数值表示来反映训练样本表征关键词的典型程度。在此基础上,利用改进的Citation-kNN多示例学习算法求解待标注图像的类别标记。该算法无需求解每个关键词对应的目标示例,而是基于惰性学习的思想直接实现了待标注图像包级别的类别判定,从而完成图像标注的任务。

【Abstract】 With the development of multimedia technology and computer network,the content-based image retrieval(CBIR) system becomes more and more important to organize,index and retrieve the massive image information in many application domains,which has emerged as a hot topic in recent years.The main difficulty of CBIR lies in how to make computers understand the semantic information of images from the human’s perceiving view,and narrow down the well known semantic gap between low-level visual features and high-level semantic concepts.The former part of this thesis mainly focuses on the human perception oriented image retrieval algorithm, especially about how to extract the semantic information from the image and how to effectively integrate into the human’s high-level semantics to improve the retrieval performance.The latter part of this dissertation mainly focuses on the automatic image annotation,especially about how to establish an effective machine learning model to resolve the annotation problem,as well as how to improve the effectiveness of training samples in order to refine the annotation performance.For the region-based image retrieval,the author argues that in most cases the user is only interested in a portion of the image,and the rest of the image is irrelevant.In order to resolve such ambiguous problem,a totally data-driven,selective visual attention model based image retrieval algorithm is proposed.Firstly the saliency map is generated via the attention model,and both salient edges and salient regions are extracted automatically by fusing the edge map and segmented image with the corresponding saliency map,which can be regarded as the user’s retrieval intention.Then the effective feature descriptors are proposed and fused for the final image semantic retrieval.For the image retrieval task which combines machine learning theory with relevance feedback mechanism,the dissertation focuses on the graph-based semi-supervised learning algorithm with application to region-based image retrieval.Different schemes which both incorporate the region saliency into the graph-based semi-supervised learning framework are applied to deal with two types of feedback.Firstly,in the case that no sample or only positive samples are available from the user’s feedback,the retrieval task can be resolved via a transductive learning manner,a hierarchical graph model which incorporates region saliency information is constructed and the manifold-ranking algorithm is adopted subsequently for positive label propagation.Secondly,in the case that the user provides both positive and negative samples,the region-level adjacency matrix will be constructed via the feedback samples,and the manifold-ranking algorithm is also adopted here to choose instances which truly represent the user’s query semantics.The selected instances are then used to retrieve the relevant samples. For the automatic image annotation,by analyzing the fact that the annotation issue exist ambiguity both in the input space and output space,the dissertation presents a novel semi-supervised multi-instance multi-label(SSMIML) learning framework,which aims at taking full advantage of both labeled and unlabeled data to address the annotation problem.Specifically,a reinforced diverse density algorithm is applied firstly to select the instance prototypes(IPs) with respect to a given keyword from both positive and unlabeled bags.Then,the selected IPs are modeled using the Gaussian mixture model(GMM) in order to reflect the semantic class density distribution. Furthermore,based on the class distribution for a keyword,both positive and unlabeled bags are redefined using a novel feature mapping strategy.Thus,each bag can be represented by one fixed-length feature vector so that the manifold-ranking algorithm can be used subsequently to propagate the corresponding label from positive bags to unlabeled bags directly.For the image annotation refinement,most existing algorithms rarely take into account the fact that,for the samples relevant to a certain keyword,their typicalities or relevancy scores to the keyword are generally different.Inspired by the kernel density estimation,the dissertation proposes a confidence weight computation algorithm,which uses a real num between[0-1]to represent the sample’s relevancy score to a certain keyword.Moreover,an improved Citation-kNN multiple-instance learning algorithm is proposed to solve the annotation issue.In contrast with the existing annotation algorithm which intends to learn an explicit correspondence between keywords and target concepts,the proposed method can directly annotate the keywords to the unlabeled images based on the lazy learning style approach.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络