节点文献

基于词袋模型的图像分类算法研究

Research on Image Categorization Based on Bag-of-words Model

【作者】 吴丽娜

【导师】 罗四维; 黄雅平;

【作者基本信息】 北京交通大学 , 计算机应用技术, 2013, 博士

【摘要】 随着互联网的高速发展,数字图像大量地出现在人们的生活中,其数量和类别都发生了大规模地增长。图像分类能够帮助人们有效地组织和管理图像,这种技术得到了越来越多的重视。在各种图像分类方法中,词袋模型作为一种基于局部特征的图像分类方法取得了很好的分类性能,因此得到了广泛的研究和应用。词袋模型的一个重要的研究内容是如何创建和优化视觉词典(视觉单词集),以便更有效的表示图像并提高算法的分类性能。其另一个重要研究内容是如何利用迁移学习提高算法在新图像类别中的分类性能。词袋模型的迁移学习不仅能避免在每一类新图像中词袋模型都需要重新学习的问题,还能适用于仅有少量样本的图像分类任务。本文以创建适合迁移学习的视觉词典为目标,研究视觉词典优化和改进方法,提出用局部空间信息将多个视觉单词进行组合构成视觉短语。这种视觉短语能更有效地挖掘和表示不同图像之间的共同特征,消除视觉单词的“语义歧义性”,并能迁移到新类别图像的视觉词典中。本文的研究内容分为两大部分:第一,研究如何获得有效并有判别力的视觉单词和包含空间信息的视觉短语,为图像分类提供必要的信息(特征的表面信息和空间信息);第二,在新类别的图像学习中,尤其是仅有少量图像样本时,研究如何利用已学好的图像类别知识,通过迁移视觉短语加快新类别图像的学习并提高分类性能。围绕上述内容,本文的主要研究工作和创新性体现在以下三个方面:第一,提出一种加权的最小冗余最大相关(Weighted minimal-redundancy-maximal-relevance,WMR-MR)准则。WMR-MR准则从信息论的角度出发,根据视觉单词与图像类别之间、视觉单词与视觉单词的相关性,综合评估视觉词典在分类过程中的相关性和冗余性。通过删除视觉词典中与类别相关性弱且与词典内其他单词具有冗余性的单词,优化视觉词典,既保留了富有判别力的视觉单词,又缩减视觉词典的规模。利用该准则可以用相对小规模的视觉词典完成对图像集的描述,并保持算法的分类性能,解决了视觉词典规模过大带来的计算复杂性高、单词之间存在冗余的问题。而且这种小规模的视觉词典为创建视觉短语,以及视觉短语的迁移学习建立了基础。第二,提出一种创建包含局部空间信息的视觉短语的方法。在提取图像局部特征的同时获取局部特征的空间位置信息,并依据局部特征之间的稳定的邻近关系建立视觉短语,获得能够表示局部空间信息的视觉短语模型。与全局空间信息相比,本文的包含局部空间信息的视觉短语能够更灵活地处理图像类内的变化,有较强的鲁棒性。而且,视觉短语有助于消除独立使用其中任一单词可能带来的歧义性,增强对图像描述的可靠性。描述图像局部特征表而信息的视觉单词和描述图像局部空间信息的视觉短语,共同构成图像分类任务的两条线索。由于不同类别图像的空间结构性不同,该算法可以通过设定权值对两条线索进行权衡,使之能够适用于不同类别图像的分类任务中。第三,提出一种基于视觉短语的迁移学习算法。提出采用视觉短语来描述不同类别图像之间的共同特征,充分利用已有的知识帮助新类别图像的学习。实验证明,与直接迁移视觉单词相比,迁移视觉短语能更有效地提高词袋模型的分类效果。在新图像类的学习过程中,算法通过循环迭代的方式调整所迁移的视觉短语,保留对新图像分类有益的视觉短语,使得分类器在新图像类中也能获得良好的分类效果。与重新学习视觉词典的分类算法相比,这种迁移算法有效地利用了已有知识,在新类别图像的训练样本较少的情况下,也能获得较好的分类效果。

【Abstract】 With the rapid development of the internet, a large number of digital images arise in our lives, and their number and categories have a massive increase. Image categorization has gained more and more attention as it can help people organize and manage images effectively. Bag-of-visual words(BOV) model which is based on local features for image categorization has been shown to yield state-of-the-art results.An important research on BOV model is how to create and improve vocabulary to represent images effectively and improve performance of BOV. Another important research is the transfer learning of BOV, which can avoid BOV model learning from the beginning for each category. The transfer learning can retain good performance in the learning task when there are only a few images.This paper analyzes each step (feature extraction, feature description, vector quantitation, classifier learning) of BOV model, and improves the vocabulary to fit transfer learning.This paper studies on the methods of optimizing and improving visual vocabulary, which aims at creating visual vocabulary for transfer learning. This paper proposes that creating visual phrases through the composition of several visual words by utilizing spatial information. The visual phrase can find and represent common local spatial information among different image categories, and avoid semantic ambiguity, which can be transferred to visual vocabulary of a novel image category. There are two parts of research in this paper:the first part is that how to obtain discriminative vocabulary and a set of phrases with spatial information, which can provide necessary knowledge (appearance information and spatial information); the second part is that how to make use of learned knowledge to speed the learning of a new category and improve its performance, especially when there are only a few training images. The main creative work and research of the paper is summarized as follows:1. A weighted minimal-redundancy-maximal-relevance criterion (WMR-MR) is defined. The criterion of WMR-MR considers both the redundancy between one word and another and the relevance of between a word and the category. The algorithm improves a vocabulary by eliminating redundant words which have less relevance with its category. Discriminative vocabulary with a relative small vocabulary is obtained which can solve the problem that large vocabulary can result in complicated computing and redundant words. The vocabulary obtained by this algorithm can provide a basis for creating visual phrases and the transfer learning of phrases.2. An algorithm of creating visual phrases with local spatial information is proposed. The position information can be obtained when extracting local features. According to this, stable neighbor relation between visual words can be modeled by visual phrases. Compared with global spatial information, the local spatial information of the visual phrases can deal with intra-class variation, which has strong robustness. Moreover, the visual phrases are helpful for eliminating ambiguity when a visual word is used for image categorization individually. So visual phrases are more reliable than words. Visual words which represent appearance information of local features and visual phrases which represent local spatial information are integrated to form two sources of information for categorization. The algorithm can balance these two sources by adjusting the weight for various image categories, so it can be applied in different image categorization.3. A transfer learning algorithm based on visual phrases is proposed. The algorithm describe the common features of various image categories by visual phrases, which is aimed at making use of learned knowledge to help the learning of a novel image categorization. During the learning of a novel category, the algorithm adjusts visual phrases to transfer by the way of iteration to retain visual phrases which are helpful for image categorization. The retained visual phrases can make the classifier of the novel categorization have a good performance after transfer learning. Compared with relearning vocabulary, this transfer learning algorithm makes use of learned knowledge effectively, which can gain good performance especially in the situation when there are a few images in a novel category.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络