节点文献

真实世界环境下的自动图像标注方法研究

Research on Real-World Automatic Image Annotations

【作者】 芮晓光

【导师】 俞能海;

【作者基本信息】 中国科学技术大学 , 信号与信息处理, 2010, 博士

【摘要】 随着多媒体影像技术和存储技术的快速发展,互联网上图像信息呈爆炸性增长。视觉图像信息与文本信息相比,更加生动、易于理解。这些数字图像的应用非常广泛,如商业、新闻媒体、医学、教育等方面。因此,如何帮助用户快速、准确地找到需要的图像成为近年来多媒体研究的热点课题之一。而解决这一课题最重要的技术就是自动图像标注技术。但是,传统的自动图像标注研究主要在受限环境下进行的,例如只是针对人工收集的小规模图像数据库,基本没有考虑真实世界环境下的图像标注问题。这造成了一些传统自动图像标注方法在实际应用中遇到了很多问题,如图像标注性能不高,用户对图像标注的感受不好,无法处理大量的语义概念等等。因此,研究传统自动图像标注方法在真实世界环境下的推广,以及针对传统方法的不足研究真实世界环境下的新的自动图像标注方法,都具有重要的意义。本论文尝试研究真实世界环境下自动图像标注的关键问题。论文对大规模图像标注学习算法、网络图像标注、多语言环境下的图像标注和图像标注改善等问题进行了深入地研究。另外,我们设计了基于提出的真实世界环境下图像标注算法的图像检索演示系统,并研究了图像表示和图像检索排序问题,实现了真实世界环境下大规模图像数据库快速有效地检索。本文主要成果和创新之处包括以下几个方面:1.提出了一种基于大规模距离尺度学习算法的自动图像标注方法。首先,提出了一种区分性距离尺度学习算法。该算法通过保存数据集的局部非线性结构和利用数据的区分性信息来学习马氏距离尺度,可以改善基于K近邻方法的自动图像标注算法的性能。然后,提出了一种集成的距离尺度学习算法,使得区分性距离尺度学习算法可以通过并行或者在线的方式实现有效地训练,从而可以处理大规模数据。实验表明,集成距离尺度学习算法不仅可以提高图像标注性能,也可以大大降低标注模型的学习时间。2.提出了一种基于集成思想的大规模支持向量机算法实现了图像的自动标注。支持向量机是自动图像标注的常用方法。通过首先在数据子集上分别学习然后集成的思想,实现了大规模支持向量机算法。该算法可以大大提高原有支持向量机算法的可扩展性。实验表明,与常见的支持向量机算法相比,集成支持向量机算法在基本不损失性能的情况下,可以在较短时间内处理百万级的训练数据。3.提出了一种基于二部图加强模型的网络图像自动标注算法。如何利用网络图像的己有文本信息来帮助图像标注是网络图像标注的关键。提出的算法可以从网络图像的已有文本中提取若干单词作为候选标注,然后利用大规模图像数据扩展出更多标注,并将所有标注建模成一个二部图模型。通过在二部图模型上的加强学习算法,可以重排序已有图像标注。实验结果表明,提出的算法可以大大提高网络图像原有标注的性能。4.提出了一种基于统计模型的图像标注方法。通过对大规模的网络图像数据集的聚类和统计建模,实现对个人图像和网络图像快速有效地标注。实验表明,提出的算法与现有算法相比,不仅提高了标注性能,而且大大提高了图像标注速度,速度可达每秒20幅图像。5.提出了一种跨语言图像自动标注框架。该框架可以利用大规模的多语言网络图像数据集作为训练集,并根据用户的母语自动提供多语言的图像标注结果。该框架提出了一种同时对标注排序和翻译的多语言标注融合的算法MAF。MAF将候选标注建模成一个n-部图模型,然后通过迭代算法提高了多语言标注的性能和翻译效果。实验结果表明,跨语言图像标注框架可以提高标注性能,并且能给用户提供多语言的标注结果。6.提出一种基于优化模型的图像标注改善算法,并给出基于该算法的统一的图像标注框架。提出的算法同时使用了标注先验知识和标注间局部语义相关性信息,并将图像标注改善问题建模成一个0-1整数规划问题实现无参数的图像标注改善。并且,它可以通过半正定优化算法实现了快速求解。与以前的方法相比,它可以直接确定最终标注,无需任何经验(设定阈值)。实验结果表明了算法的有效性。7.提出了基于空间关系的图像视觉表示方法和考虑图像质量和重要性的图像静态排序算法。结合提出的自动图像标注算法,设计并实现了一个基于大规模数据库的实时图像检索演示系统。总之,论文对真实世界环境下自动图像标注的研究,有助于理解图像与概念之间的深层联系,帮助实现视觉信息的统一表示模型,对多媒体领域的研究具有较大的意义,对探索和发展大规模学习理论也具有一定的借鉴意义。

【Abstract】 With the prevalence of digital imaging and storage equipment, there are more and more images available on the Internet. Compared with text information, visual images are more vivid and easy to understand. These digital images have been widely used in the business, education, science and technology. Thus, how to design efficient and effective image retrieval technologies have been an important research direction for academic. A key solution to this problem is automatic image annotation technology.But most of automatic image annotation approaches are studied in limited circumstances, e.g. only designed for the collection of small-scale artificial image databases, without considering the real-world image annotation problem. This causes that when existing image annotation methods are applied in practical application, they has encountered many problems, such as low image annotation performance, bad user feeling for image annotation and cannot handle a large number of semantic concepts, etc. Therefore, researching on the extension of current methods to real-world situation and researching on new real-world methods to solve problems of existing methods are very important.Additionally, we design an image retrieval demo system based on the proposed image annotation approaches. We also research on some other key problems of image retrieval, such as image representation and image ranking. The main contributions of this dissertation are as follows:1. Proposed a large scale distance metric learning algorithm based automatic image annotation method. First, we proposed a discriminative distance metric learning (DDML) algorithm which can improve the KNN-based image annotation methods. Then, an aggregated distance metric learning method (ADML) is proposed, which can train DDML in a parallel way or an online way. Thus, ADML can handle large scale problems. The experimental results show that the proposed method can improve both effectiveness and efficiency of image annotations.2. Proposed a large scale support vector machine algorithm (ASVM) to automatically annotate images. Instead of learning from the entire data, our method divides the training set into subsets. A series of sub-models can then be learned from subsets of training samples by SVM, followed up by a simple global aggregation. ASVM can largely improve the scalability of original SVM solvers. And millions of data can be trained in a short time by ASVM.3. Proposed a bipartite graph reinforcement model (BGRM) for web image annotations. How to utilize this information to help tagging images is the key of web image annotations. The proposed model extracts surrounding text and other textual information of images as candidate annotations. They are then extended to include more potentially relevant annotations by searching and mining a large-scale image database. All candidates are modeled as a bipartite graph. Then a reinforcement algorithm is performed on the bipartite graph to re-rank the candidates. Only those with the highest ranking scores are reserved as the final annotations. The experimental results show BGRM can largely improve the annotation performance.4. Proposed a real-world image annotation approach based on statistical model (SRIA) for real-world image annotations. SRIA can leverage large scale training data set to annotate both personal and web images in a unified framework efficiently. The experimental results show SRIA not only improves the annotation performance but also speed up the annotation process.5. Proposed a cross language image annotation framework. The proposed framework can utilized the large scale multilingual web image data as training set, and provide multilingual annotations according to the mother languages of users. By using the idea of "two languages are more informative than one", we proposed a multilingual annotation fusion algorithm (MAF) for candidate annotation ranking and translations. The experimental results show the good performance of the framework.6. Proposed an optimization-based image annotation refinement algorithm (OptTag). Based on the proposed algorithm, we provide a unified image annotation framework. OptTag perform non-parametric image annotation refinement based on 0-1 integer optimization model using the prior and joint local probabilities. It can be efficiently solved by semi-definite optimization problem. Additionally, it can directly determine final tags while many previous approaches just use predefined thresholds for deciding unrelated words. The experiments demonstrate the effectiveness of OptTag.7. Proposed a spatial visual topic model based image representation, and an image static ranking called SocialRank for revealing the importance and quality of images. By incorporating with proposed image annotation method, a real-time image retrieval demo system is established based on a large scale image database.In a word, research on real-world automatic image annotations helps to understand the deep relation between images and concepts, benefits achieving the unified representative model of visual information, is of great significance not only to research on multimedia, but also to large scale learning theory.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络