节点文献

生物视觉启发的图像识别技术研究

Research on Image Recognition Inspired by Biological Vision

【作者】 孟祥林

【导师】 王正志;

【作者基本信息】 国防科学技术大学 , 控制科学与工程, 2011, 博士

【摘要】 图像识别是当前视觉领域的研究热点,其根本任务是借助计算机对图像所包含的场景或目标进行分类和辨识,在基于内容的图像检索、智能环境感知、军事目标识别等领域有着广泛的应用前景。传统的工程方法能够较好的处理结构化环境中的视觉识别任务,但在应对非结构化的自然场景分类和目标识别问题时会遇到很大的困难,有很多问题亟待解决,诸如:如何减弱甚至消除环境中的噪声、光照、遮挡等不确定因素的影响,实现自然图像的稳定感知;如何有效捕获图像的全局信息,实现快速的场景感知分类;如何将视觉注意机制融入图像识别过程,提高图像目标识别的性能等。本文以自然图像为研究对象,借鉴人类的视觉感知机理,结合大脑视皮层的生理结构和功能以及认知心理学的相关实验结论,围绕自然图像识别的上述问题开展了一些探索性研究,完成的主要工作如下:前注意的边界和表面感知:研究了如何鲁棒地检测自然图像的边界轮廓,以及在不同的光照条件下,如何稳定感知物体的表面亮度。本文主要针对Grossberg的BCS/FCS神经模型在处理自然图像时存在的问题进行了分析,提出了相应的改进和优化方案,使得自然图像的轮廓检测不受噪声和小范围遮挡的影响,增强了鲁棒性;表面亮度感知克服了原模型存在的亮度信息丢失、表面雾化、边缘模糊等问题,可以有效恢复物体表面的感知亮度,而且对于光照变化不敏感。此外,受表面恢复过程的神经元活性扩散机制和认知心理学启发,提出一种基于视觉掩蔽效应的图像扩散算法,可以在滤除噪声的同时有效地保留图像的重要结构特征。场景的快速感知分类:提出一种场景全局特征描述方法,能够捕获场景的全局结构特性,包含了场景中大致的几何信息,与人类通过快速获取场景的空间布局结构信息判断其语义内容的心理学观点一致,同经典的SIFT特征描述方法相比,更适合于场景图像的识别。该方法简单易实现,计算速度快。视觉空间注意机制建模:提出一个基于认知心理学和生理学的视觉注意计算模型。将输入图像映射到心理视觉空间,然后在每个特征图上构建一个全连接图,并利用基于图方法的随机游走模拟视皮层神经元间的信息传递,依据信息最大化原则和特征整合理论生成最终的显著图,模型在感兴趣区域检测和人眼注视预测方面优于现有模型。此外,场景信息对于目标的选择注意具有指导作用,结合本文提出的场景全局特征描述方法,建立了空间注意的上下文引导模型,引入自顶向下的注意调制机制,对于任务相关的主动视觉搜索过程具有较好的预测性能。引入注意机制的目标识别:将视觉注意机制融入目标识别过程,构建了一个基于注视转移的目标识别框架:NIMART。模拟人类在目标学习和识别过程中的注视转移,利用注意生成的显著图指导眼动,并考虑了注视转移过程中的返回抑制机制,利用自适应共振理论对提取的注视区域特征进行学习和决策。NIMART符合人脑学习和识别目标的机理,在通用图像数据集上的实验表明模型具有良好的目标识别性能。

【Abstract】 Image recognition is one of the hotspots in the field of vision research. Itsfundamental task is to categorize the image scene or identify the objects in the scene bythe computer. It has wide application prospects in the field of content-based imageretrieval, intelligent environment perception, military target recognition, etc.Conventional engineering techniques could perform well in most visual recognitiontasks of structural scenes. However, when it comes to the issues about non-structuralscene categorization and object recognition, it will be very difficult to get satisfactoryresults using these traditional methods. There are many problems to be solved, such as:how to weaken or eliminate the influences of uncertain factors in the environment, suchas noise, illumination, occlusion, etc., to achieve stable perception of natural images;how to capture the scene gist effectively to achieve fast scene perception andclassification; how to incorporate visual attention mechanism into image recognition toimprove object recognition performance, etc.Inspired by the visual perception mechanism of human and primates, and in viewof the physiological properties of visual cortex and relevant conclusions of cognitivepsychology, we make some exploring study on several issues mentioned above ofnatural image recognition. The main work of this dissertation is as follows:Pre-attentive boundary and surface perception: we work over how to detect theboundary contours of natural images robustly, and how to stably generate surfacelightness percepts under variable illumination conditions. The problems of BCS/FCSneural model proposed by Grossberg in processing natural images are analyzed, andcorresponding modification and optimization schemes are proposed. As a result, wecould detect the contours of natural images, which is robust to noise and small occlusion.Also, the modified lightness perception model could effectively generate absolutelightness percepts, which is insensitive to illuminations and overcomes the problems oforiginal model such as lightness information loss, fogging, and blurring. Besides, anovel image diffusion algorithm based on visual masking effect is proposed inspired bythe neuron activity diffusion mechanism in surface recovering and cognitive psychology.The proposed algorithm could effectively smooth noises while preserving importantstructural properties of the image.Fast perception of scene images: we propose a novel visual descriptor for scenerecognition. It is a holistic representation and could capture the global structuralproperties and rough geometrical information of the scene. The proposed method isconsistent with the psychophysical findings, which suggests that human could quicklyget the scene gist and percept scene categories through the spatial layout information ofthe scenes. Experimental results show that it performs better than classical SIFT descriptor in recognizing scene categories. It is very easy to implement and could becomputed very fast.Modeling visual spatial attention mechanism: we put forward a computationalmodel of visual attention motivated by cognitive psychology and neurophysiology. First,the input image is transformed into a psychovisual space. Then we construct afully-connected graph on each feature map. A random walk is adopted on each sub-bandgraph to simulate the information transmission among the neurons in visual cortex.Consequently, we derive the activity maps corresponding to every feature map from theinformation maximization principle. We obtain the final saliency map by summing upall the activity maps according to Feature-Integrated-Theory. The proposed visualattention model performs better than existing models in detecting region of interest andpredicting human fixations. In addition, the scene gist information could guide theselective attention of objects in the scene. So we present a contextual guidance model ofspatial attention to introduce the top-down modulating influences. The gist informationis obtained through the proposed global image representation mentioned above. Thecontextual guidance model of attention predicts the image regions likely to be fixated byhuman observers well in active visual search tasks.Object recognition incorporating visual attention mechanism: we build aframework for object recognition inspired by saccade-based visual memory, namely,NIMART. This framework incorporates visual attention into object recognition.NIMART simulates the sequential visual attention of fixating salient locations whenobservers learn and recognize objects in a scene. The saliency map derived from visualattention model is used to guide eye movements. Inhibition of return has beenconsidered in the sequential fixations. We analyze the fixated regions for learning anddecision-making with adaptive resonance theory (ART). NIMART accords with themechanism of learning and recognizing objects of the brain. Experiments demonstratethat it could perform very well on widely-used datasets.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络