节点文献

互联网环境下大规模图像的内容分析、检索和自动标注的研究

Large Scale Image Content Analysis, Retrieval, and Automatic Annotation in Web Environment

【作者】 王长虎

【导师】 张宏江; 李明镜;

【作者基本信息】 中国科学技术大学 , 信号与信息处理, 2009, 博士

【摘要】 随着互联网和数字摄影设备的普及和发展,互联网上的图像数量飞速增长。一方面,互联网上的海量图像吸引了越来越多的用户;另一方面,越来越丰富的图像资源使用户难以在浩如烟海的数据中找到其真正需要的信息。这使得快速、有效的图像检索技术成为商业界和学术界的一个重要研究方向。当前,互联网图像检索主要分成两大类:基于文本的图像检索(text-basedimage retrieval,简称TBIR),和基于内容的图像检索(content-based imageretrieval,简称CBIR)。TBIR在商业图像搜索引擎中被广泛使用。在TBIR系统中,互联网图像的文本信息用来索引和搜索图像。因此,图像文本标注的质量成为TBIR中的一个重要的问题。CBIR是学术界中一个非常流行的方向。在CBIR系统中,图像的视觉内容被用来索引。它面临的最主要的困难是语义鸿沟问题,即图像的低层内容特征(如颜色),不能有效的描述高层语义(如“狗”)。在本文中,我们尝试充分利用互联网图像丰富的文本信息和视觉信息,来解决上面提到的几个问题。我们对自动图像标注、图像标注改善、减小互联网图像检索中的语义鸿沟、基于对象的图像检索等问题进行了深入的研究。另外,为了更好地处理和利用互联网上的海量数据,更有效地帮助用户的在线检索,我们在设计相关算法和实现检索系统的时候,还特别地注意了其处理大规模图像的能力以及实时性。本文主要成果和创新之处包括以下几个方面:1.讨论并分析了自动图像标注问题,提出了一个多标记稀疏编码的框架来进行特征提取和分类,并把它应用到自动图像标注中。我们认为具有部分重叠标记的两张图像之间的语义相似度应该以一种重构的方式而不是一对一的方式来度量。因此,在这个框架中,图像标记向量之间的语义相似度,以及图像特征向量之间的语义相似度,都基于一对多的l~1稀疏重构/编码来度量。2.讨论并分析了大规模的自动图像标注问题,并提出了一个基于搜索的图像标注框架。在这个框架下,我们给用户提供了一个在线图像标注服务,可以对用户提交的任意图像进行实时的标注。我们从互联网上收集了一个大规模的图像库,并把它用做训练集来标注任意一张图像。快速检索技术的应用和大规模图像库的使用保证了我们提出的基于搜索的图像标注框架处理大规模图像的能力及实时性。3.讨论并分析了图像标注改善问题。我们把图像标注改善问题表述成一个马尔可夫过程,并在这个框架下解释了已有的图像标注改善工作。针对已有工作的问题,我们提出了一个基于内容的图像标注改善算法。马尔可夫过程表示的有效性,以及待标注图像与训练集中图像的内容信息的充分利用,使得我们提出的算法很大程度上改善了已有算法中存在的若干问题。4.讨论并分析了互联网上基于内容的图像检索中的语义鸿沟问题,并提出了一个基于排序的距离度量学习算法。通过互联网图像丰富的文本信息的引导,我们试图在视觉空间中学出一个新的距离度量,使得给定一张查询图像,基于这个新的距离度量,我们可以在图像库中检索到与查询图像语义上更相关的图像。基于这个新的距离度量学习算法,我们提出了一个大规模的基于内容的图像检索(CBIR)框架,并在2.4 million规模的互联网图像库上实现了一个实时的CBIR检索系统。5.讨论并分析了用多实例半监督学习(MISSL)算法来解决基于对象的图像检索问题。我们针对MISSL问题提出了一个新的正则化框架。基于这个框架,我们提出了一个基于图的多实例学习(GMIL)算法来解决MISSL问题。同样,在这个框架下,GMIL可以分别退化成一个新的标准多实例算法(GMIL-M)和一个标准半监督学习算法(GMIL-S)。我们从理论上证明了GMIL-S算法具有闭式解,以及GMIL和GMIL-M的迭代解的收敛性。我们用GMIL算法来解决基于对象的图像检索问题,实验结果验证了GMIL算法的有效性。

【Abstract】 With the prevalence of the Internet and digital cameras,there are more and more digital images on the Web.On the one hand,the increasing number of images attracts more and more users;on the other hand,it is not easy for common users to find what they really need from the sea of images.Therefore,effective and efficient image retrieval techniques have become an important research direction in both commercial and academic circles.Currently,there are mainly two image retrieval frameworks:text-based image retrieval (TBIR),which is widely used in commercial image search engines,and content-based image retrieval(CBIR),which becomes a hot research topic in academic communities. In text-based systems,images are indexed and retrieved based on textual information of Web images,where the quality of the annotations of images is one of most important issues in text-based image retrieval.In content-based image retrieval, images are indexed by their visual content,in which one key problem is the semantic gap between low-level visual features and high-level semantic concepts.In this dissertation,we try to fully utilize the rich textual and visual information of Web images to solve the above-mentioned problems in Web image retrieval. The following key techniques of Web image retrieval are discussed:automatic image annotation,image annotation refinement,reducing the semantic gap in Web image retrieval, and object-based image retrieval.Moreover,to better handle and utilize the large amount of data on the Web,and make users more convenient during the online retrieval process,we particularly consider the scalability and efficiency of the proposed algorithms and developed systems.The main contributions of the dissertation are as follows:1.We present a multi-label sparse coding framework for feature extraction and classification within the context of automatic image annotation.We claim that the semantic similarity of two images with overlapped labels should be measured in a reconstruction-based way rather than in a one-to-one way.Beyond the one-to- one similarity,the semantic similarities of label vectors and image features are both measured based on one-to-all l~1 sparse reconstruction/coding as introduced afterwards.2.We study the problem of large scale automatic image annotation,and a search-based image annotation framework is proposed.Under this framework,a online image annotation service has been deployed to annotate arbitrary images submitted by users in real time.A Web-scale image database is crawled from the Web,and used as the training set to annotate an arbitrary image.The application of both efficient search technologies and Web-scale image set guarantees the scalability of the proposed algorithm.3.We study the problem of image annotation refinement.We formulate the annotation refinement process as a Markov process,and based on which we explain some existing annotation refinement algorithms.In order to solve the problems in existing algorithms,we propose a content-based image annotation algorithm. Owing to the effectiveness of the Markov process formulation and the use of content information of the query image as well as training images,the proposed algorithm resolves the problems in existing algorithms to a large extent.4.We study the problem of bridging the semantic gap in content-based image retrieval on the Web,and propose a ranking-based distance metric learning algorithm. Piloted by the rich textual information of Web images,the proposed framework tries to learn a new distance measure in the visual space,which can be used to retrieve more semantically relevant images for any unseen query image. Based on the ranking-based distance metric learning algorithm,we propose a novel framework for large scale content-based image retrieval(CBIR).We also implement a real-time CBIR system on a 2.4 million Web images dataset.5.We study the problem of using multiple-instance semi-supervised learning to solve object-based image retrieval problem.A novel regularization framework for MISSL is presented.Based on this framework,a graph-based multiple-instance learning(GMIL) algorithm is proposed to solve MISSL problem.Un- der the proposed framework,GMIL can be reduced to a novel standard MIL algorithm(GMIL-M) and a standard SSL algorithm(GMIL-S).We theoretically prove the existence of the closed form solution for GMIL-S and the convergence of the iterative solutions for GMIL and GMIL-M.We apply the GMIL algorithm to solving object-based image retrieval problem.Experimental results show the superiority of the proposed method.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络