

Research on Scene Classification Technologies with the Local Context Feature and Spatial Pyramid Model

【作者】 涂潇蕾

【导师】 胡正平;

【作者基本信息】 燕山大学 , 通信与信息系统, 2012, 硕士

【摘要】 场景图像分类是依据人类视觉感知原理,对包含不同语义信息的图像进行自动分类的过程,为指导目标识别等视觉任务提供了重要的环境线索,成为当前计算机视觉领域的研究热点。与文本中的单词相似,对图像进行视觉词汇的建模可以形成一种中层表示方法,建立有效的场景语义描述。本文在场景图像的词包模型基础上,围绕特征提取、视觉词汇构建及视觉词包描述方面开展以下研究:首先,针对传统视觉词汇仅由独立的局部视觉特征形成,忽略图像特征间相邻关系的缺陷,构建一类包含多方向上下文信息的视觉特征,利用具体类分离的词汇生成方式形成视觉词汇表,进而结合空间金字塔模型来完成场景分类。该方法将图像在特征域的相似性同空间域的上下文关系有机的结合起来并加以类别区分,在实验中取得了较好的分类效果。其次,鉴于上下文关系在图像特征表示方面的重要作用,对图像邻域信息的有效性做进一步研究,利用图像区域的平坦度构建一种无监督地自适应上下文特征提取方案,依据图像具体类别生成视觉词汇,并结合稀疏编码空间金字塔模型将视觉特征编码为视觉词汇的联合分布来完成场景分类。该方法选择图像中更加有效的上下文特征,使得分类效果有明显提高。最后,为了进一步挖掘图像中不同方面的视觉属性,在自适应上下文特征基础上引入局部自相似描述子,构造一种联合互补特征的描述方法,采用具体类词汇生成方式与稀疏编码形成词包描述,进而利用偏最小二乘原理构建一种区分空间金字塔表示来完成场景分类。该方法能够增强词包描述的适应性与区分力,在实验中具有较好的分类性能,对复杂的室内场景效果更好。

【Abstract】 The process of scene image classification is that how to make computer systems toclassify the image sets automatically which contain semantic information, according to thevisual perception mechanism of human. Scene classification has become an activeresearch topic in the computer vision area, which provides important environmental cluesfor object recognition and other computer vision tasks. Be similar to the words in text data,modeling scene images with visual vocabulary could form a middle representation, whichdescribes the semantic information of scene images effectively. Based on the bag-of-wordsmodel of scene images, we focus on the feature extraction, visual words formation andvisual words representation to do the following research:First of all, the traditional visual words are formed by local features independently,and consider nothing about the relations among of features. To overcome this defect, wepropose a kind of local visual features that include multi-direction context information,and use the category-specific strategy to form the visual vocabulary, after that the spatialpyramid model is combined to accomplish the scene classification. According to differentscene categories, this method combines the feature similarity and contextual relationtogether. The experiments show that this method performs better than the existed methods.Secondly, since the context relations play an important role in the featurerepresentation of images, we do some further research about the effectiveness of contextinformation. Utilizing the flatness information of image regions, a new feature extractionmethod is proposed to form adaptive context features in an unsupervised manner, and thevisual words are formed by the specific categories of images. After that the integrateddistribution vectors of visual words are computed by applying the spatial pyramid modelwhich based on sparse coding, and the scene classification is accomplished by thosedistribution vectors. This method chooses the effective context features and theexperiment results show that it could achieve a higher accuracy obviously.Finally, in order to extract the further visual property of scene images in differentaspects, we introduce a local self-similarity descriptor to describe scene images on the basis of adaptive context features. Then an image representation method is proposed bycombining the two complementary features, and the visual words model is obtained byusing the category-specific strategy and sparse coding, after that a discriminative spatialpyramid representation is exploited by applying partial least squares theory to accomplishthe scene classification. This method could make the bag-of words model more flexibleand discriminative. The experiment results show that this method could achieve a higheraccuracy, especially perform well in complicated indoor images.

  • 【网络出版投稿人】 燕山大学
  • 【网络出版年期】2012年 11期