节点文献

基于显著局部特征的视觉物体表示方法

Visual Object Representation Based on Salient Local Features

【作者】 王彦杰

【导师】 贾云得;

【作者基本信息】 北京理工大学 , 计算机应用技术, 2010, 博士

【摘要】 视觉物体表示(visual object representation)是联系底层图像信息和高层语义概念的纽带,在图像感知、场景理解等计算机视觉任务中起着关键性的作用。基于局部特征的视觉物体表示具有表示能力强、对图像遮挡和背景混淆较为鲁棒的特点,近年来引起人们的高度重视。本文从局部特征的统计建模与判别学习入手,研究基于局部特征的视觉物体表示方法,主要研究包括视觉单词的统计建模与学习、类别显著局部特征检测(category salient local feature detection)以及通用和类别视觉单词(visual word)的相互协作表示。对视觉单词进行统计建模,可以度量局部特征的变化规律,更准确地表示视觉物体。检测类别显著局部特征可以快速定位物体在图像中的位置,并为建立类特定的视觉物体表示奠定基础。将类别的和通用视觉单词结合起来,充分发挥二者的优势,提高物体表示的类别可区分能力,更有利于解决图像分类问题。本文提出了一种视觉单词的统计建模和判别学习方法。假设属于同一视觉单词的局部特征服从高斯混合分布;采用改进的最大-最小后验伪概率判别学习方法从样本中计算高斯混合分布,利用基于最小描述长度的期望最大化方法估计该分布的成分个数。在此基础上,采用后验伪概率度量局部特征与视觉单词间的相似度,建立视觉单词软直方图(soft histogram)表示。本文给出了两种图像软直方图表示策略:一种是基于分类的软直方图方法,根据相似度最大原则建立局部特征与视觉单词间的对应关系;另一种是完全软直方图方法,根据相似度将局部特征对应到所有的视觉单词。在Caltech-4和PASCAL VOC 2006两个图像数据库上进行的实验结果表明,高斯混合模型建模的视觉单词优于高斯模型建模的视觉单词和基于聚类中心的视觉单词;利用判别学习算法可以提高相似度度量和物体识别的准确性;在相同条件下,视觉单词软直方图表示比硬直方图表示更加优越。本文提出了一种类别显著局部特征的检测方法,该方法结合表观(appearance)和上下文(context)两种信息定义局部特征的类别显著性,以序贯的方式提取表观显著且上下文显著的局部特征。根据局部特征属于某个物体类别的后验概率来度量表观显著性,并采用易于直接计算的后验伪概率模拟后验概率,其中的未知参数由最大-最小后验伪概率判决学习方法从仅具有类别标注的训练图像中学习得到。在显著局部特征检测的基础上建立邻域同现星状模型(co-occurrence star model),根据局部特征邻域的视觉单词同现性规律定义上下文显著性。本文将类别显著局部特征检测算法应用于物体识别与定位问题,以初始候选窗口的显著局部特征分布情况为依据快速地选择有效候选窗口,排除大量不相关的初始候选窗口,以此提高物体定位的计算效率和效果。在数据库INRIA horse和PASCAL VOC 2006上进行的显著局部特征检测和物体定位实验表明,检测的类别显著局部特征具有物体类别的可区分性,并取得较好的物体定位结果。本文提出了一种类别和通用视觉单词相互协作的视觉物体表示方法。假定属于某一个物体类别的局部特征服从高斯混合分布,每一个高斯成分被看做是一个类别视觉单词。对所有的类别视觉单词进行k-means聚类以产生通用视觉单词,记录通用视觉单词和类别视觉单词间的对应关系。在图像表示过程中,计算所有局部特征属于通用视觉单词的平均后验伪概率,将该值分配到与之对应的类别视觉单词,该值反映了通用视觉单词及其对应的类别视觉单词在图像中出现的可能性。将一个物体类别所有的类别视觉单词在图像中出现的可能性按照指定顺序排列,得到类特定的特征向量表示。本文提出的图像表示方法具有较强的类别可区分能力,在PASCAL VOC 2006和Caltech-101数据库上进行物体识别实验,在Corel-5K数据库上进行图像标注与检索实验,均取得了较好的图像分类实验结果。

【Abstract】 Visual object representation bridges the gap between low-level image features and high-level semantic concepts. It plays an important role in computer vision tasks such as image recognition, scene understanding, and etc. Visual object representation based on local features has advantages of expressiveness and robustness, and attracted a lot of attention in recent years. This dissertation focuses on the statistical modeling and discriminative learning of category local features for visual object representation, including the statistical modeling and discriminative learning of visual words, category salient local feature detection, and the cooperation between category and universal visual words. By statistically modeling visual words, the accuracy of visual object representation can be improved. Through detecting category salient local features, object positions in images can be rapidly located. Using the cooperation between category and universal visual words, the better discriminative ability will be brought to object classifiers.An approach to the statistical modeling and discriminative learning of visual words is proposed. The distribution of local features from each visual word is assumed as the Gaussian mixture model (GMM) and learned from the training data by the Max-Min posterior Pseudo-probabilities (MMP), a discriminative learning method. The similarities between each visual word and corresponding local features are computed, summed up, and normalized to construct a soft-histogram. Two representation methods are considered in the object recognition experiments, to evaluate the proposed algorithm. The first one is called classification-based soft histogram, in which each local feature is assigned to only one visual word with maximum similarity. The second one is called completely soft histogram, in which each local feature is assigned to all the visual words. The experiments are conducted in Caltech-4 and PASCAL VOC 2006 databases.An algorithm is presented to detect category salient local features in images. We consider category appearance saliency as well as category context saliency of local features. Firstly, the category appearance saliency of a local feature is determined by the posterior probability of being a specific object category. Then, the local features with category appearance saliency are verified by contextual information in their neighborhood. Actually, a co-occurrence star model is constructed to measure category context saliency of local features based on the co-occurrence relationship between visual words. We apply the proposed algorithm to object localization and recognition. The experimental results on INRIA horse, PASCAL VOC 2006, and Caltech-101 datasets show that our algorithm brings better efficiency and effectiveness of object localization and improves the accuracy of object recognition.An image representation and classification algorithm with the cooperation between category and universal visual words is described. Category visual words are generated by assuming that local features in training images of a class are of a distribution of GMM. The number of visual words for a class is automatically determined by the minimum description length criterion. All the category visual words are clustered to obtain universal visual words. A category-specific image representation is defined by employing the cooperation between two types of visual words. The resultant feature vectors of an image vary with different classes, including their dimensionalities and elements. We integrate the proposed method into the MMP learning to perform image classification. The corresponding image classifier is evaluated in its applications to object categorization and automatic image annotation. Experimental results on PASCAL VOC 2006, Caltech-101 and Corel-5K datasets show that the proposed method is effective and promising.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络