节点文献

基于视觉感知和相关反馈机制的图像检索算法研究

Researches on Image Retrieval Algorithms Based on Visual Sensation and Relevant Feedback Mechanism

【作者】 黄传波

【导师】 金忠;

【作者基本信息】 南京理工大学 , 模式识别与智能系统, 2011, 博士

【摘要】 随着数字图像数量的飞速增长,如何高效、快速地从海量图像数据中检索出所需要的信息已成为当前图像应用领域的一个重要问题。图像内容可以分为两类,即视觉内容与语义内容。视觉内容也就是通常所讲的图像的物理表示,如:纹理、颜色、形状、方向等;语义内容是指图像的信息,如:主题、场景、人物等。为了检索图像,首先要对图像内容进行理解,并将其内容形式化表示。由于图像内容的复杂性和人类认知的主观性,对图像进行正确的理解和表示是一项很困难的工作。本文主要围绕基于内容图像检索(Content-based image retrieval, CBIR)中的图像特征提取及克服“语义鸿沟”等相关方面的技术展开研究,系统讨论了图像信息表示、特征提取和相关反馈技术。研究的内容属于目前图像处理和信息检索领域的研究重点,具体研究包括:为充分利用视觉注意模型的显著性信息和Contourlet变换良好的稀疏性及能准确地捕获图像中边缘信息的特性,提出一种基于视觉注意及Contourlet变换的图像检索算法。首先对颜色、亮度和方向特征采用Itti视觉注意模型获取特征图,采用图像的Contourlet多尺度分析方法获取图像的子带,然后,在获取特征图和Contourlet方向子带的基础上,运用局部二值模式傅里叶直方图(Local Binary Pattern Histogram Fourier, LBP -HF),有效地提取每个特征图中的综合显著信息,采用每个方向子带的一阶和二阶矩抽取纹理信息,通过将显著信息和纹理信息有机结合,实现多特征融合的图像检索。由于视觉注意模型能够在注意机制的驱动下,依据用户的感知将图像中的关键信息突出出来,同时抑制那些非主要信息,因此,在研究Itti视觉注意模型的基础上,通过构造不同的初级视觉特征,提出了两种基于视觉注意的特征抽取算法。算法-在充分考虑纹理特征与视觉感知关系的基础上,构造一个粗糙度图,用作视觉注意模型的一个初级视觉特征。通过该改进视觉注意模型得到50个视觉特征图,然后分别对每个视觉特征图采用局部二值模式傅里叶直方图(LBP-HF)方法抽取其分布信息,从而获得每幅图像的高维特征,最后利用局部保持投影(Locality Preserving Projects,LPP)方法进行维数约简,获取低维特征,用于图像检索。算法二,由于采用高对比度获得显著性图是一种合理的仿生途径,为选择带有最多细节、拥有最大对比度且灰度值分布最广的频谱分量,用主分量图作为亮度初级视觉特征,并引入具有丰富边缘、纹理和形状信息的梯度图作为视觉注意模型的一个初级视觉特征,改进Itti视觉注意模型。基于该改进视觉注意模型得到50个视觉特征图,对这50个视觉特征图运用方法一相同的过程,获取低维特征,进行图像检索。将先验标签信息引入到学习过程中,可以从两个方面来考虑,一是运用适当的距离测度学习,二是采用有监督特征映射子空间学习。基于不同的距离测度学习方式,本文提出了两种相关反馈算法。基本思路是使用有鉴别信息的距离测度来代替欧氏距离测度,将先验鉴别信息引入到k-近邻块中,增大不同类间的间隔,提高流形学习的泛化能力;同时利用无标签图像点的分布信息半监督学习各图像点间的相关性,用图策略来建立图像库中各图像点间的相关性模型,当有标签图像点不足时,依靠无标签图像点的分布信息能够尽可能准确地挖掘出图像间所潜在的语义信息来辅助检索的实现。运用有监督特征映射子空间学习,发现高维数据中有意义的低维嵌入空间也是引入已知标签信息,构建相关反馈机制的一种有效途径。基于此,本文提出了两种相关反馈算法:算法一,利用每次反馈用户标注的先验图像点标签信息构建类内最近邻图和类间最近邻图,刻画图像数据库的几何和鉴别结构,在整体图的约束下,寻找一个最优投影,在这个低维投影空间中,采用线性近邻标签传递算法进行标签标注传递;算法二,充分利用每次反馈用户标注的图像点的标签信息和无标签图像点的空间几何分布信息,半监督学习各图像点间的相关性,克服已知标签较少时对线性鉴别分析(Linear Discriminant Analysis, LDA)有效性的影响,将线性签别分析(LDA)与线性近邻传递算法相融合,获取鉴别嵌入子空间,并采用线性近邻标签传递算法完成标签信息标注,进而实现相关反馈的图像检索。

【Abstract】 With the substantial increasing number of digital images, it has become an important issue in the image application field to retrieve required information from massive image data efficiently and rapidly. Image contents can be classified into two categories:visual content and semantic content. The former is usually called "image out of physical representation", such as texture, color, shape, direction, etc; the latter refers to image information, such as theme, scene, characters, etc. To retrieve image from data, the first step is to understand the image and represent its information correctly. However, it is rather difficult to realize due to the complexity of image content and subjectivity of human being’s cognition.This paper makes a research on how to extract features while overcoming semantic wide gap on content-based image retrieval (CBIR). It discusses systematically on image information representation, feature extraction and relevant feedback technologies. Dealing with a key research point in the image processing and information retrieval field, the contents of the thesis are as follows:In order to make full use of saliency information of visual attention model, good sparsity of Contourlet transform and its trait of being able to accurately capture the images of edge information, an algorithm based on visual attention and Contourlet transform was proposed:first, to obtain feature map to the color, brightness and direction feature through Itti visual attention model and to acquire image direction sub-bands through Contourlet multiscale analysis; and then to extract saliency information from each feature map efficiently through Local Binary Pattern Histogram Fourier, and to extract texture information through the first and second order moment in each direction sub-band. When all these are combined, image retrieval based on multi-features fusion will be realized.Driven by the attention mechanism, the visual attention model will highlighten the key information and suppress those non-native main information according to user’s sensation. Thus, after researching Itti visual attention model and building different primary visual features, the author proposes two feature extraction algorithms on the basis of visual attention. Algorithm 1, to form a roughness map based on the relationship between texture characteristics and visual perception, and then taking it as a primary visual feature in visual attention model. By improving the visual attention model, we can get 50 visual feature maps and extract their distribution information respectively by method of LBP-HF and get the large-dimensional characteristics of each image. At last the dimension reduction of these images by Locality Preserving projects (LPP) can help us get low dimension feature for image retrieval. Algorithm 2, as the high contrast acquired distinctiveness map is a reasonable biomimetic way, to choose the spectrum component containing most details and the largest contrast and grey value of the most extensive distribution, we use principal component map as brightness primary visual feature. And we use the gradient map with rich edge, texture and shape information as the primary visual feature of the visual attention model. With the improved Itti visual attention model, we can obtain 50 visual feature maps, the low dimension feature of which can be used for image retrieval by the same processes of algorithm 1.A priori label information is introduced into the learning process, we shall take two aspects into consideration:one is to use a proper distance measure learning, and the other is to employ supervised feature mapping subspace learning.This paper, based on different distance measure learning methods, proposes two relevant feedback algorithms. By replacing the Euclidean distance measure with the identifiable information distance measure, putting the priori identification information into the k -neighbor block and increasing the interval of different classes to improve the generalization ability of manifold learning; at the same time, applying the distribution information of no label image point to semi-supervising learning the relation of each image point, and establishing the relation model between each image point in image database by graph strategies. Whenever there is defect in label image points, we can rely on the distribution information of no label image points to dig the potential semantic information of the figure inside structure as accurately as possible to assist the retrieval realization.Feature mapping subspace learning is an efficient method to find the meaningful low dimensional embedded space in high-dimension observation data, which introduces label information and construct the relevant feedback mechanisms. Therefrom, this paper proposes two algorithms. Algorithm 1 is mainly to construct the nearest chart of inside class and the nearest neighbor diagram between classes which uses a priori feedback image point label information of each user noted, to depict the geometry and identifiable structure of image database, so as to find an optimal projection under the constraint of the whole figure. Under this low-dimensional projection space, to transfer label information by linear neighborhood propagation(LNP) algorithm; Algorithm 2 is mainly to makes use of label information which is noted by user and space geometry distribution information of no label image points to semi-supervised learning the relation of image points and to overcome the influence of LAD effectiveness when the known label is fewer; to combine the LDA with LNP algorithm, to get identifiable embedded subspace, and to complete the note of label information using LNP algorithm, and finally to realize the image retrieval of relevant feedback.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络