节点文献

图像的语义标注及其改善问题研究

Research on Image Annotation and Image Annotation Refinement

【作者】 刘峥

【导师】 马军;

【作者基本信息】 山东大学 , 计算机系统结构, 2011, 博士

【摘要】 随着数码照相机、具有照相功能的手机等设备的迅速普及,数字图像呈现出爆炸式地增长趋势,而且随着互联网的飞速发展,越来越多的人能够更加方便、快捷、经济地使用这些图像数据。目前面临的问题不再是缺少图像数据资源,而是如何在浩如烟海的图像数据中找到自己所需要的信息。如何对规模庞大的数字图像进行快速高效的检索,成为亟待解决的问题。现有的图像检索系统主要利用图像的语义标注词进行基于语义的图像检索,但是随着图像数量的激增,人工进行图像标注显然不现实。因此,对图像进行自动语义标注成为图像检索领域的重要问题,得到了学术界和企业界越来越多的关注。鉴于已有图像标注方法的标注准确性还未达到令人满意的程度,因此如何对已标注图像进行标注结果的优化与改善成为了图像的语义标注这一研究领域的重要问题之一。本文针对不同类型的图像,提出了一系列有针对性的语义标注以及语义标注改善的方法,主要研究成果和创新点表现在以下五个方面:(1)提出了一种基于LDA主题模型的图像标注方法。首先,利用图像训练集建立一个视觉词袋模型,并利用LDA模型计算待标注图像和标注词词典中各标注词之间的相关度,从而获得图像的初始标注。接下来,提出一种基于搜索的标注词扩展方法,将初始标注提交到图像搜索引擎,从搜索引擎返回的结果中选取与待标注图像相似的图像,进而从这些相似图像的周边文本中获取图像的扩展标注词。最后,将初始标注词集合和扩展标注词集合进行合并,获得最终标注。(2)提出了一种面向社会网络图像共享社区的图像标注方法。该类网站允许用户在上传图像时为图像提供标签,我们利用用户提供的标签对图像进行语义标注。首先,将待标注图像分割后的图像区域作为样例数据点,对用户提供的标签进行过滤后得到图像的初始标签,并将其所对应的图像视觉特征作为待排序的数据点,利用流形排序算法对图像的初始标签进行排序。接下来,利用Flickr提供的API函数以及加权投票策略对排序位次高的初始标签进行扩展,从而得到扩展标签。最后,将排序位次高的初始标签集合和扩展标签集合合并,得到图像的最终标注。(3)提出了一种面向图像共享社区中个人相册的图像标注方法。首先,利用位置敏感哈希函数将图像的SIFT描述符映射到哈希桶中,并将每个哈希桶看作直方图的一个柱,把待标注图像转化为直方图,通过计算直方图的距离得到两幅图像之间的视觉相似度,从而对个人相册进行去除重复图像的处理。然后,利用图像的视觉特征和图像GPS坐标构造三分图,通过对三分图的划分进行个人相册中图像的聚类。将Core15K数据集作为训练集,建立视觉词袋模型,为该数据集标注词词典中的每个标注词求出与之对应的视觉词语向量。对个人相册聚类后得到的图像簇,通过视觉词袋模型求出图像簇所对应的视觉词语向量,从训练集的标注词词典中选择与其相关度高的词作为图像簇的标注。(4)提出了一种基于二分图增强学习算法以及概念本体推理的层次化Web图像标注方法。首先,从Web页面中抽取图像的初始标注,通过概念本体对初始标注进行推理,将初始标注和经过概念本体推理得到的层次化扩展标注作为图的顶点,构造二分图。然后,通过二分图增强学习算法对初始标注和扩展标注进行排序,并提出了一个标注词选择策略,从排序后的初始标注词集和扩展标注词集中选取图像的最终标注词。(5)提出了一种基于图划分和图像搜索引擎的图像标注改善算法。该算法通过对待标注图像的候选标注词进行去噪处理,提高标注的准确性。算法的核心思想是将候选标注词作为图的顶点,将标注词之间的相关度作为边的权值,从而将图像标注改善问题转换为图划分问题。我们用两个参数对标注词间的相关度进行加权处理后计算出边的权值。第一个参数是根据图像搜索引擎返回结果计算出的候选标注词与待标注图像视觉特征之间的相关度,第二个参数是候选标注词在待标注图像所属页面中的重要程度,此参数仅适用于Web图像。然后,用启发式最大割算法对构造出的图进行划分,最后从图划分后得到的两个标注词集中选择其一作为最终标注。本文对图像的语义标注及其改善问题的研究,有助于理解图像中包含的语义概念,提升图像检索系统的性能,对多媒体领域的研究也具有较大的意义。

【Abstract】 With the rapid popularization of digital cameras and mobile phones equipped with camera devices, the number of digital images increases explosively. Particularly, with the rapid development of Internet, more and more people can use digital images conveniently and fastly with low cost. Currently, the key problems for us are not lack of image resources but how to find what we really need in large-scale digital images. Hence, it is of high importance to retrieve the large-scale digital images dataset rapidly and efficiently. The existing image retrieval systems mainly employ semantic annotations of images to carry out semantic-based image retrieval. However, manual image annotation is not suitable for the increasing number of images. Therefore, automatic image annotation is an important issue in image retrieval, and gains more and more attentions from both commercial and academic circles. As the existing image annotation methods are still far from practical, how to optimize and improve image annotation results becomes one of the most important problems in image annotation research areas.In this dissertation, we propose a series of methods to annotate images and refine image annotations for different kinds of images. The main research contents and innovations are shown in the following five aspects.(1) We present a LDA topic model based image annotation method. Firstly, a bag of visual words model is trained from a training dataset, and initial image annotations could be obtained by discovering the relationship between the unlabeled image and words in annotation dictionary from LDA Model. Afterwards, we propose a searching based annotation expanding method. We submit initial annotations to image search engine, and then select the images which are similar to the unlabeled image from searching results. Then, the extended annotations could be extracted from surrounding texts of similar images. Finally, we combine initial annotations and extended annotations to build up final annotations.(2) We present an image sharing community oriented image annotation method. As image sharing communities allow users to provide image tags when uploading images, we exploit user-supplied tags to annotate images. Firstly, the initial tags are ranked using manifold-ranking algorithm, by which regions of the photo to be annotated are served as queries to launch manifold-ranking algorithm which ranks the initial tags according to their relevance to the queries. Next, using Flickr API, top ranked initial annotations are expanded by a weighted voting scheme. Finally, we combine top ranked initial tags with expanding tags to construct final annotations.(3) We propose a personal photo collection oriented image annotation approach, and personal photo collections are downloaded from image sharing community. Firstly, we employ locality-sensitive hashing to map the SIFT descriptors of an image to hash tables. Afterwards, given a hash table, a histogram is obtained by the way that each bin is corresponding to a bucket of the hash table. Then, image similarity could be computed by estimating the distance between histograms. A pair of images which are belonged to one personal photo collection are considered to be near-duplicate when the similarity between them is higher than a predefined threshold. After deleting the near-duplicate images in personal photo collection, image visual features and GPS information are used to construct a tripartite graph, and then images of the photo collection are clustered through tripartite graph partitioning. Next, Corel5K dataset is used to establish a visual word model and obtain visual word vector for each word in annotation dictionary. For each photo cluster, a visual word vector is obtained from bag of visual word model. Afterwards, all words in the training dataset vocabulary are ranked by the distance between visual word vectors of photo cluster and themselves. Finally, the words with high ranking score are reserved as final annotations.(4) We propose a hierarchical Web image annotation approach by bipartite graph enhancing algorithm and concept ontology reasoning. Given a Web image, initial annotations are extracted from the surrounding texts and other textual information of the hosting Web page. A concept ontology is applied to achieve hierarchical probabilistic image concept reasoning for multi-level image annotation. After the concept reasoning process, a set of extended annotations is obtained. The initial annotations and the extended annotations are considered as two disjoint sets of graph vertices to construct a bipartite graph. Then, an annotation enhancing algorithm is designed to re-rank the initial annotations and extended annotations based on the bipartite graph. Finally, we design an annotation selecting policy and the annotations with the highest ranking scores are reserved as the final annotations.(5) We present a novel algorithm solving image annotation refinement problem(IAR) by graph partitioning and image search engine. Our algorithm focuses on pruning the noisy words in candidate annotation set to enhance image annotation performance. The main idea of the proposed algorithm lies in that candidate annotations are served as graph vertices, and the relevance between two candidate annotations is used to construct the edge weight. Then, the image annotation refinement problem can be converted to the weighted graph partitioning problem. The edge weight is the annotation similarity weighted by two parameters. Parameter 1 is the relationship between candidate annotation and image visual features, and parameter 2 refers to the importance of candidate annotation in host Web page. Next, we compute Max-Cut of the graph using a heuristic algorithm. After the graph is bi-partitioned, one of the two vertex sets is chosen as final annotations.In short, research on image annotation and image annotation refinement could help us to understand semantic concepts in digital images and promote the performance of image retrieval system. Moreover, this research is of great significance to the research of multimedia.

  • 【网络出版投稿人】 山东大学
  • 【网络出版年期】2011年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络