节点文献

基于内容的图像检索中特征表示与检索策略研究

Research on Feature Representation and Retrieval Strategy for Content-based Image Retrieval

【作者】 侯刚

【导师】 陈贺新;

【作者基本信息】 吉林大学 , 通信与信息系统, 2014, 博士

【摘要】 近年来,随着网络和计算机技术的飞速发展,社会进入了以“大数据”为标志的网络数据时代,近年来,《Nature》和《Science》等国际顶级刊物相继出版专刊来探讨对“大数据”的研究。对网络数据的研究对维护社会稳定、推动社会发展、提升行业竞争能力、促生新兴战略性产业及对科学研究的方法论有着重要的作用。而网络数据的重要组成部分——图像,作为包含丰富信息内容的多媒体信息,随着Internet的日益普及和网络技术的不断发展,越来越成为网络数据的主流。如何挖掘蕴含在巨大图像数据中丰富的信息,如何实现对这些图像数据的有效组织、分析、管理,已成为网络数据时代信息处理领域的一个重要的发展方向和研究热点。网络图像数据具有:数据量巨大、维度和信息分辨率较高、非结构化的数据形态、解释的多样性、模糊性和不确定性。这些特点使得相关领域的研究成果难以被直接借鉴于对网络图像数据的研究。如何合理的组织、表达、存储、查询和检索这些海量的图像数据是目前我们面临的亟待研究和解决的问题,也是一个重大的挑战。因此,如何建立高效的图像检索模型和方法,能够交叉多学科,综合运用数字图像处理、模式识别、统计学习、机器视觉等理论与方法并与传统数据库技术结合起来,能够根据图像的底层视觉属性特征建立起与高层语义信息的有效关联,给出性能良好的图像检索模型与方法,检索出用户所需的、满意的图像具有重要的理论意义和现实的价值。于是,基于内容的图像检索技术便成为解决这一问题的研究重点和趋势。目前的检索模型与方法很难满足人们的需求,其主要原因是目前对图像描述和表达多数还是基于底层物理属性特征,与用户的意图,即高层语义信息(如描述的图像主题类型、事件、表达的情感等)间存在巨大的鸿沟。因此,一方面我们需要研究更高效的图像描述与表达的模型与方法,另一方面就是充分利用用户信息,架起底层视觉特征和高层语义概念之间的桥梁。本文针对以上问题,综合运用机器学习、图像处理、人类视觉认知机制等理论和方法,尝试展开了相关研究工作,研究检索过程中对图像的描述和表达、基于动态反馈的检索机制、基于视觉注意机理的检索机制三个方面的问题。本文主要工作如下:1.基于改进随机游走的图像检索经典的基于随机游走的图像检索是一个相对比较完善的检索模型与方法。然而,在随机游走的过程中一般没有考虑到图像分量特征的重要性以及图像间的空间关系等因素,在图像表示及检索效率上往往会遇到问题。针对以上问题本文提出了三个解决策略:首先,基于图像的特征在检索中的重要性,考虑采用特征选择方法,通过给图像分量特征加权,来表征每维特征的重要程度。基于拉普拉斯得分具有很好的局部保持能力,因此在特征加权方面,采用高效的拉普拉斯得分方法。其次,考虑到每幅图像的K近邻充分体现了图像的空间结构关系。因此。在随机游走的过程中结合K近邻选择方法。在相关反馈的过程中通过寻找所有相关和不相关图像的K近邻,很大程度提高了随机游走的检索效率。这样保证了问题的解更全面更优。在经典的数据库上和运用不同的图像特征将本文提出的方法与目前几种流行的检索方法进行了比较,实验结果也显示我们的方法优于其他方法。最后,我们在基于随机游走的图像检索框架上提出,包含特征加权和K近邻方法的层次检索策略。实验结果表明,在图像特征相同的条件下,在不同的数据库上,本文方法的检索结果明显优于其他两种对比方法。检索结果的精准率和召回率方面,本文的方法优于其他两种对比实验方法,在运行时间上本文的方法与经典的基于随机游走的图像检索方法基本相当。2.基于动态流形更新特征的反馈式检索与传统的线性降维算法相比,流形学习方法的主要特点是假设分布在高维空间中的样本点处于或者近似地处于非线性流形上。而流形学习的目标就是发现数据集中的非线性流形结构并在降维的同时尽可能地保持这些结构信息。基于最大边缘准则(Maximum Margin Criterion, MMC),本文提出了动态流形更新特征的反馈式检索策略。该方法基于最大化类间平均边缘来寻求最优的线性子空间,并充分利用用户反馈信息,动态建立查询流形空间。因此,MMC能提高算法计算效率,同时,MMC可以有效避免小样本问题。首先,利用用户选择的相关图像,使用MMC对原始特征进行绛维,获取特征变换矩阵。然后,将特征变换矩阵作用于原始特征进行特征变换,根据更新后的特征和用户标记样本,采用随机游走算法计算所有样本的得分。最后,将得分最大的前K张图像显示给用户,直到满足条件为止。实验表明,基于最大边缘准则的动态流形更新特征的反馈式检索策略,表现出了很强的泛化能力,优于基于单样本检索方法,具有较高的检索性能。3.基于互信息描述符的图像检索本文提出了一种互信息描述符方法实现对图像的描述和表达,用于图像检索。互信息描述符的提取和表示是在视觉认知机制指导下完成的,该方法遵从人类视觉认知机制及人眼结构来提取特征,并模拟神经系统传输信息的过程进行特征融合。在潜意识阶段利用互信息描述符提取特征,并在意识阶段将其表达,从而进行检索。互信息描述符针对锥状细胞和杆状细胞对颜色和方向较敏感的特点提出方向及颜色敏感图,因此互信息描述符符合人眼视网膜细胞获取特征的机理。互信息描述符的表示是在模拟神经系统传递信息的过程中进行的。在特征融合过程中,基于机器学习理论,模拟周围环境,为视细胞提取的特征进行约束加权,最终通过特征向量将图像在“大脑”中重构。因此,互信息描述符包含了颜色、形状、纹理等特征,以及特征的分布情况,且具有一定的空间定位性。同时互信息描述符具有较低的维度,这样大大降低了算法的时间与空间复杂度。实验表明,互信息描述符与边缘方向直方图、微结构描述符等经典图像检索方法相比,具有较高的索引能力,且具有平移、仿射不变性,能够更准确、全面的检索图像。

【Abstract】 Recently, with the rapid development of Internet and computer technology, the societyhas entered the network data era marked by “big data”. Recently, international top journalssuch as Nature and Science have been published special issues to discuss the research oflarge data. The study of network data plays an important role in maintaining social stability,promoting social development, upgrading industry competition ability, emerging strategicindustries and the growth of scientific research methodology.As an important part of network data, image is one kinds of multimedia information,which contains abundant information content. With the growing popularity of Internet andthe continuous development of network technology, the image is more and more becomingthe mainstream of network data. How to dig the large information in the image, and how torealize the effective organization, analysis and management for image data have become animportant development direction and research hot spot in the field of information processingin the era of network data.Network image data has great amount of data information, higher dimension andresolution, unstructured data morphology, and the diversity, vagueness, uncertainty toexplain. These characteristics make the related research achievements difficult to be directlycopied to the study of network image data. How to reasonable group, express, store, queryand retrieve these huge amounts of image data is the problem we face to study and solvecurrently as well as a significant challenge.Therefore, how to establish efficient image retrieval model and method can cross themultidisciplinary, develop the theories and methods of digital image processing, patternrecognition, statistical learning and machine vision synthetically, combine with traditionaldatabase technology, and can set up the effective relation with high-level semanticinformation according to the low-level visual attributes of images, provide the image retrieval model and method with good performance, and retrieve the required andsatisfactory image for user, which is of important theoretical significance and practical value.Therefore, content-based image retrieval technology has become the research focus andtrend to solve the problem.The recent retrieval model and method is very difficult to meet the needs for people,which reason is that the most expression and description for image are based on thelow-level physical attributes, which also has a huge gap with the user’s intent named as thehigh-level semantic information (such as the description of image topic type, events, emotionexpression, etc). Therefore, on the one hand, we need to study more efficient description andexpression of image model and method, on the other hand is to make full use of the userinformation to set up a bridge between the low-level visual features and high-level semanticconcepts.In this paper, main work is as follows:1. Image retrieval based on improved random walkThe classic image retrieval based on random walk is a relatively perfect retrieval modeland method. However, in the process of random walk, they did not consider the imagecomponent feature image as well as the importance of the spatial relationship betweenfactors, which often encounter problems in image representation and retrieval efficiency.This paper proposes two solving strategy:Firstly, based on the importance of image features in retrieval, we take into account toadopt feature selection methods and weight for image component to characterize theimportance of the each feature. The Laplacian score has good local keeping ability, thus, it isutilized for feature weight.Secondly, each K neighbor embodies the spatial structure relations between images.Therefore, K neighbor selection method is combined with random walk. In the process ofrelevance feedback, looking for all K neighbor of all relevant and irrelevant images improvesretrieval efficiency greatly which can ensure the solution more comprehensive and better.Extensive experiments on classic database demonstrate that the proposed method is superiorto some well-known methods. At last, we improve the random walks-based image retrieval in two ways, i.e., addingfeature re-weighting and K nearest neighbor method into random walks-based imageretrieval simultaneously.The retrieval result of our method is superior to other two compared methods. Using thesame image feature, our method outperforms other two compared methods significantly ondifferent datasets. In the terms of precision and recall, our method performs better than othertwo compared methods. At the running time, our method is almost the same with classicalrandom walks-based image retrieval method.2. Dynamic manifold feature updating based feedback retrievalCompared with traditional linear dimensionality reduction methods, manifold learningbased method is characterized by assuming that samples distributing in high-dimensionalspace are in or approximately in a nonlinear manifold. The aim of manifold learning is tofind the nonlinear manifold structure in dataset and maintain the structural information asmuch as possible while reducing the dimensionality.We propose a dynamic manifold feature updating based feedback retrieval methodbased on Maximum Margin Criterion (MMC). This method finds the optimal linear subspacebased on maximizing the margin between classes and creates dynamically a manifold spaceas query by using the feedback information of the user. Therefore, MMC can improve thecomputational efficiency, while avoiding the small sample size problem.Firstly, reduce the dimensionality of original feature space through MMC, by makinguse of the relevant images selected by user, and obtain the transformed matrix.Secondly, apply the transformed matrix to original features. According to the updatedfeatures and labeled samples by user, compute the scores of all samples by random walksalgorithm.Lastly, display the K images with highest scores to the user, until the condition issatisfied.3. Content and mutual information descriptor based image retrievalIn this paper, a content based image representation method for image retrieval is proposed, called mutual information descriptor. The extraction and representation of mutualinformation descriptor is achieved under the guideless of visual cognitive mechanism. Thismethod extracts features in following the human visual cognitive mechanism and thestructure of human eyes, and fuses features by simulating the process of informationtransmission in neural system. The features are extracted in the pre-attentive stage by usingmutual information consistency descriptor, and are represented in the attentive stage.Mutual information consistency descriptor constructs orientation-sensitive andcolor-sensitive maps, for the characteristic that cone and rod cells are more sensitive to colorand orientation. So, the mutual information consistency descriptor compliances with thecharacteristics of human retinal cells in acquiring features. The representation of mutualinformation consistency descriptor is implemented by simulating the process of informationtransmission in neural system. In the fusion step, features extracted by retinal cells areweighted with constraint by simulating the surrounding environment using machine learning.Finally, reconstruct the image in "brain" by the feature vector. Therefore, the mutualinformation consistency descriptor includes color, shape, texture and the distribution offeatures, and has ability of spatial localization to some extent. At the same time, the mutualinformation consistency descriptor has low dimensionality, which reduces the timecomplexity and space complexity greatly.Experimental results demonstrate that, under the same experimental conditions, mutualinformation consistency descriptor has higher retrieval ability compared with edgeorientation histogram and micro-structure descriptor, and is invariant to translation and affinetransformation, when applying to image retrieval. This is because that the mutualinformation consistency descriptor can extract simultaneously the properties of color, shape,texture and the distribution of features, and has ability of spatial localization to some extent.Therefore, the proposed method can retrieval images accurately and comprehensively.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2014年 09期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络