节点文献
基于显著区域的图像语义分类方法研究
Research on Image Semantic Classification Method Based on Salient Regions
【作者】 梁上松;
【导师】 何东健;
【作者基本信息】 西北农林科技大学 , 计算机系统结构, 2011, 硕士
【摘要】 随着数字图像获取设备的广泛使用,数字图像的数量成指数性增长。如何对海量数字图像进行快捷、高效的图像组织、分类与检索就成为颇具价值的研究课题。本文在图像显著区域的基础上,重点研究图像视觉词袋的构建方法、图像相似度度量方法和图像语义分类方法,并对图像语义分类算法进行了测试。本文的主要研究内容如下:(1)针对图像全局特征附带过多冗余信息,较难表示图像主要类别信息的问题,提出一种图像视觉词袋构建方法。首先,利用Harris-Laplace区域检测子获得图像的显著区域,并使用特征描述子对显著区域进行描述形成特征向量;然后,对图像的显著区域特征向量使用仿射传播聚类算法进行聚类;最后,把每个聚类中心当做该图像的视觉单词,对应聚类的特征向量个数与图像总的特征向量个数的比重作为该视觉单词的词频,形成图像的视觉词袋。实验结果表明,构建的视觉词袋能够代表图像的主体信息。(2)为了更合理地度量两幅图像之间的相似程度,提出一种基于图像视觉词袋的EMD(Earth Mover’s Distance)相似度度量方法。该方法把视觉单词看作是直方图中的Bin,把视觉单词词频看作是直方图中对应Bin的统计值。首先,构建一个保存两幅图像视觉单词之间的欧式距离的相似矩阵;然后,通过约束条件寻找两个图像视觉词袋之间唯一存在的一个流;最后,获取两幅图像之间的相似度。实验结果表明,该相似度度量方法能够比较合理的度量两幅图像之间的相似度,对图像语义分类产生积极的影响。(3)针对图像语义分类问题,提出一种多图像特征描述子多最近邻居的图像语义分类方法。首先,使用Harris-Laplace检测子获得每幅图像的显著区域,在不同图像特征描述子下,对显著区域进行特征描述形成特征向量;然后,对特征向量使用仿射传播聚类算法形成特定描述子下的图像视觉词袋,并通过基于视觉词袋的EMD距离度量方法寻找未标记图像的来自每个类别的最近邻居;最后,综合各个特征描述子,利用获得的未标记图像与对应的最近邻居之间的相似度关系,对未标记图像进行图像语义分类。(4)使用Matlab、Java和C++编程语言在著名图像数据库1000-image和Caltech-101 Object上进行算法测试。1000-image上的实验结果表明,综合利用多种图像特征描述子这一策略,比使用单一图像特征描述子可使得图像语义分类平均准确率提高5%-30%;Caltech-101 Object上的实验结果表明,基于多图像特征描述子多最近邻居的图像语义分类方法在已标记样本图像数量较少的情况下,获得较好的图像语义分类平均准确率,在已标记样本图像数量较多的情况下,获得的图像语义分类平均准确率与著名算法相当。
【Abstract】 As digital image acquisition devices are used widely, the number of images increases exponentially. How to organize, classify and retrieve digital image has become a valuable research topic. Based on image salient regions, this paper mainly focuses on the approaches of bag-of-visual words construction, image dissimilarity measure and image semantic classification, and tests the classification algorithm.The main content of this research is as follows:(1) Since it is relatively difficult to distinguish different classes with the whole image features which include too much redundant information, we propose an image bag-of-visual word construction approach. To begin with, Harris-Laplace region detector is employed to acquire image salient regions, which are described to form the corresponding feature vectors by feature descriptors. Subsequently, affinity propagation algorithm is used to cluster the image salient regions. Finally, the exemplar of each cluster is regarded as visual word, while the proportion of the number of feature vectors in a cluster to that of feature vectors in the whole image is regarded as the corresponding frequency. The exemplar and the corresponding frequency are formed bag-of-visual word. Experimental results show that bag-of-visual word could express the main information of an image.(2) In order to measure the dissimilarity between two images more reasonably, we propose a bag-of-visual word based EMD (Earth Mover’s Distance) dissimilarity measure approach, which regards visual word as the bin of a histogram and the frequency of the visual word as the statistics information on the corresponding bin of the histogram. First of all, a dissimilarity matrix is constructed, which saves Euclidean distances between two image bag-of-visual words. Subsequently, an only existed flow is found for two image bag-of-visual words by subjecting to some constraints. Finally, the dissimilarity of two images is acquired. Experimental results show that this dissimilarity approach could measure the dissimilarity between two images relatively reasonably, and have great impact on image semantic image classification.(3) We propose a multi-descriptor multi-nearest neighbors image classification algorithm to address image semantic classification problems. First of all, for each image Harris-Laplace salient region detector is used to detect salient regions which are described by different image feature descriptors to form feature vectors. Subsequently, bag-of-visual word is constructed by using affinity propagation algorithm, and the nearest neighbors of an unlabeled image coming from all categories are found by bag-of-visual word based EMD dissimilarity measure approach. Finally, unlabeled images could be classified by combing different feature descriptors and using the results of dissimilarity measures between unlabeled images and their corresponding nearest neighbors.(4) Experiments are done on two renowned image database, i.e., 1000-image and Caltech-101 Object, to evaluate our classification algorithm by using Matlab, Java and C++ programming languages. Experimental results on 1000-image database show that compared to using only one kind of feature descriptor, the approach of using multi image feature descriptors could boosts mean recognition rates of image classification performance from 5% to 30%. Meanwhile, experimental results on Caltech-101 Object database show that when the number of labeled images is small, our image classification algorithm outperforms some state-of-the-art algorithms, and when the number of labeled images is large, our image classification algorithm performs almost the same as some state-of-the-art algorithms.
【Key words】 image classification; salient region; bag-of-visual word; EMD; high-level semantics;