节点文献

面向图像语义描述的场景分类研究

Image Semantic Representation Based Scene Classification Research

【作者】 顾广华

【导师】 赵耀;

【作者基本信息】 北京交通大学 , 信号与信息处理, 2013, 博士

【摘要】 如何让计算机按照人类理解的方式对海量图像数据进行高效地分类与管理,成为了图像理解领域中一个亟待解决的问题。场景分析与理解为图像的语义分类提供了可能,场景分类被明确认定为是图像语义分类中的一个关键课题。本文主要成果有:(1)提出一种基于局部熵加权特征融合的场景分类方法。鉴于不同的特征描述子适合描述不同类型的场景图像,本文针对两种局部特征描述子进行特征融合以增加场景图像特征描述的区分力。首先,通过计算图像的局部熵定量分析场景图像的复杂度,据此定义平坦度,并通过叠加场景类内每幅图像的平坦度获得该场景类的平坦度;其次,提取两种分别适用于描述区域平滑和区域变化的局部特征描述子,并分别进行图像直方图描述;然后,利用场景类图像的平坦度计算两种局部特征的权系数,并对两种基于独立的局部描述子形成的图像直方图描述加权融合,获得场景类图像的最佳描述;最后训练概率生成模型,完成场景分类任务。实验结果表明,该方法对于不同类型的图像特征描述具有一定的普适性。(2)提出一种基于超像素网格空间金字塔图像描述的场景分类方法。鉴于传统的词包模型图像描述方法忽略空间信息的缺点,本文采用上下文特征和空间金字塔图像描述来加入图像的空间信息。首先,构建多尺度上下文特征使其能够保证特征描述时加入局部空间结构信息;其次,对图像进行超像素网格分块,网格的分辨率由金字塔层数决定;然后,对各层次上超像素网格分块得到的各个图像子块依据视觉词典生成图像直方图描述,并按照一定的权重组合在一起形成整幅图像的直方图描述;最后,训练分类器,完成场景分类任务。本文采用的超像素网格分块,避免了图像中对象的强制分割,从而保证了子区域内对象语义的一致性。实验结果验证了场景分类过程中上下文信息和超像素网格分块的优越性。(3)提出一种基于局部约束线性编码特征映射方式的场景分类方法。提取图像的视觉特征并聚类生成视觉码本以后,依据码本进行视觉特征映射形成图像描述。本文提出一种基于最大求和合并法的局部约束线性编码方式特征映射方法,将前t个概率最大的码字进行线性加权取平均作为特征映射编码结果,并分析讨论t的取值对于场景分类性能的影响,并讨论了不同的码本长度与场景分类性能之间的关系。实验证明,该方法提高了特征码字之间的相关性和特征映射的鲁棒性,取得了较好的场景分类性能。

【Abstract】 How to classify and manage the vast amount of image data using the computer by the way of human understanding becomes an urgent problem in the image under-standing area. Scene analysis and understanding make the image semantic classification possible. The scene classification is clearly identified as a key issue in the image se-mantic classification. This thesis performs the middle-level semantic image represent-tations based on the visual image features, establishes the middle-level semantic concept of image and models it to make up the semantic gap between low-level features and high-level semantics. This thesis achieves the following research results:(1) This thesis proposes a scene classification algorithm based on weighted feature fusion by local entropy. Because the different feature descriptors fit the different scene images, this thesis fuses the two local feature descriptors to strengthen the discri-mination of scene image feature descriptions. Firstly, this thesis analyses the complexity of the scene image by its local entropy quantitatively, and defines the flatness of image. The flatness of each scene category is further to calculated by adding the flatness of each image in this scene category. Secondly, two local feature descriptors are extracted by describing the smooth image and change image, and the image histogram repre-sentation is constructed. Thirdly, the weighted coefficients are obtained by the flatness of scene category. The optimal image representation is obtained by the weighted fusion on the two image histogram representations. Finally, the generative model is trained to perform the scene classification. Experimental results show that this method has some universality on the different image feature descriptions.(2) This thesis presents a scene classification method based on the spatial pyramid image representation by superpixel lattices. Because the traditional image representation method based on BOW (bag-of-words) model ignores the spatial information, this thesis adds it by applying the contextual features and spatial pyramid image representation. Firstly, the multi-scale contextual features are constructed to add the local spatial str-ucture information when performing feature descriptions. Secondly, this thesis applies the superpixel lattices method to segment the image, and the resolutions are determined by the pyramid layers. Thirdly, the image histogram representations of each segmented sub-block region, from superpixel lattices on each level, are formed based on the visual dictionary. These partial sub-representations are weighted to form the whole histogram representation of this image. Finally, the classifier is trained to complete scene image classification. The superpixel lattice based segmentation method avoids the compulsory segmentations of the objects in the image. It ensures the semantic consistency of objects in sub-region. Experimental results demonstrate the superiority of the contextual infor-mation and superpixel lattices segmentation in the scene classification task.(3) This thesis proposes a scene classification method based on feature mapping by locality-constrained linear coding. We extract the visual features of images and generate a visual codebook by clustering, then run feature mapping depending on the visual codebook to form image representation. The feature mapping method, in this thesis, belongs to the way of locality-constrained linear coding based on sum-max pooling. We find out the codewords with the first tth maximum probability and weight them, then take the average weighted values as the feature mapping coding result. This thesis discusses the performance of scene classification related to the value of t and the length of codebook. Experiments prove that the proposed method improves the correlation of codewords and the robustness of feature mapping, and achieves good performance of scene classification.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络