节点文献

视觉注意计算模型的研究及其应用

Research and Application on Computational Model of Visual Attention

【作者】 陈嘉威

【导师】 周昌乐;

【作者基本信息】 厦门大学 , 人工智能基础, 2009, 博士

【摘要】 视觉注意是人类信息加工中一项重要的心理调节机制,是人类从外界输入的大量信息中选择和保持有用信息,拒绝无用信息的意识活动,是人类视感知过程中高效性和可靠性的保障。视觉注意计算模型的研究,不但有助于探索人类视觉信息处理的工作机理,而且对于解决数据筛选问题和提高计算机的信息处理效率有着重要的意义,在图像分析与图像理解领域、目标检测、信息检索、机器人视觉、视频通讯等领域也有重要的应用价值。本文对视觉注意机制及其计算方法进行了深入而细致的研究:分析总结了视觉注意机制的认知神经学理论和神经加工机制;以人类视觉加工的生理学理论为依据,紧密结合计算机视觉计算的要求,构建了一个由特征加工、注意集中和注意控制三部分组成的动态视觉注意计算模型的体系结构。提出了一种双通路和层次化的特征加工结构;提出了一种深度特征和运动特征度量方法,以反映场景时空特性对视觉注意的影响;通过IFNN神经网络模拟双通路的特征整合过程,实现注意的集中;提出了一种具有注意保持和唤醒功能的注意控制方式。在此基础上,实现了一个基于客体选择的动态视觉注意计算模型。实验表明:本文所提出的计算方法是富有成效的。本文首先总结了视觉注意机制的认知神经科学及心理学理论,从生物视觉领域找出计算机视觉可借鉴的神经生理学依据,并以之为出发点,寻找认知心理学中注意机制与计算机科学的结合点,构建一个动态视觉注意计算模型的体系结构,并将视觉注意计算划分为特征加工、注意集中和注意控制三个模块,实现认知神经科学与视觉计算的结合。在特征加工的过程中,解决了特征的选择,特征的显著性度量和特征加工的层次三个方面的问题。在特征选择方面,将特征分为空间特征与非空间特征两类,空间特征的提取是通过提出一种深度特征与运动特征的计算方法来实现的,用以反映场景的时空特性对视觉注意的影响;非空间特征通过提取亮度、颜色和方向特征得到。各类特征的显著性度量依据视觉反差计算实现。根据生物视觉中特征加工层次和功能的差异,空间特征与非空间特征通过模拟what和where双通路理论进行加工,各类特征的显著图由子特征之间相互竞争和整合得到。在注意集中方面,以视觉通路理论为指导,通过使用亮度特征、颜色特征和方向特征等非空间特征来描述物体的感受,模拟what通路的主要功能;运动特征和深度特征等空间特征用来描述场景的运动和空间信息,模拟where通路的功能。两个通路的整合通过带有自学习和可调节机制的IFNN神经网络实现。根据神经网络的脉冲发放时间进行注意焦点的选择,当两个通路的输入相关联的时候,神经网络产生最大的增益,输出单元的脉冲发放时间会比非相关单元的发放时间更短。在注意控制方面,根据视觉注意的神经控制特点,提出了一种动态视觉注意模型的注意控制方式。通过一个唤醒信号来描述视野中新异刺激的强度,根据唤醒信号的大小开启或屏蔽阈值来控制注意保持与注意唤醒状态的转换。采用注意焦点跟踪算法来实现动态场景的注意保持;并提出了一种位置增强方法,以提高新异刺激所在位置的视觉显著性。本文的研究比较完整地给出了视觉注意计算的思想与方法,实现了一个适用性较强的动态视觉注意计算模型,并提高了该计算模型的理论价值和应用价值。实验表明,本文提出的模型较好地运用了视觉认知规律,使视觉注意处理结果更加符合人类视觉感知的基本特征。

【Abstract】 Visual attention is regarded as an essential cognitive process of human visual system. Human vision relies on visual attention mechanism to select the relevant parts of scene, on which higher level tasks can be processed. Since information from only a small region of the visual field can progress through the cortical visual hierarchy, visual tasks can be effectively dealt with by limited processing resources. Visual attention models are based on the biological model of visual attention, which mimic the ability of a visual system. Researching on computational model of visual attention is not only helpful in understanding the working mechanism of human visual system, but also has important application in image analysis and understanding, object detection, information retrieval, robot vision, video communication, and etc.The dissertation addresses the research on visual attention mechanism and its computational methods. The cognitive neuroscience theories and the neural mechanism of visual attention are analyzed. According to the requirement on computer vision, architecture of dynamic visual attention model based on the biophysics and neurophysiology theories of human visual processing is established. This architecture is mainly composed of three parts: feature processing, attentional capture and attentional control. A system of two-pathway based hierarchical feature process is proposed. The extraction of depth features and motion features is realized to measure the third spatial dimension and the time-scale of the complex environments. The integration between the two pathways in brain is simulated by an integrate-and-fire neural network (IFNN), which is employed to compute the focus of attention. The approach of attentional control for dynamic visual attention model is developed to mimic the sustained attention mechanism. According to the theories and techniques, a visual attention model for dynamic scenes based on object selection is implemented.The cognitive neuroscience theories and neural mechanism of visual attention are analyzed. To meet the requirement of computer image processing by summarizing the latest study of the biological vision, the biological enlightenment for computer vision is provided. Under the idea of bionics, the architecture of dynamic visual attention model is established, which relates computer image information processing with biophysics and neurophysiology. The model is mainly composed of three parts: feature processing, attentional capture and attentional control.Three major problems are solved in feature processing: feature selection, saliency computation and the hierarchical processing of feature. On feature selection, the features are classified into two types: spatial features and non-spatial features. The extraction of spatial features which include depth features and motion features are realized to measure the third spatial dimension and the time-scale of complex environments. The non-spatial stimulus features include intensity, color, orientation, and etc. Saliency computation depends on the computation of feature contrast. According to the differences in hierarchy and function of different features, the processing of spatial features and non-spatial features simulates the two pathways processing in brain. The saliency map of each feature is created by the competition or integration of the sub-features.On attentional capture, the approaches of feature integration and attention focus are developed based on the two-pathway theory. We use the non-spatial features (including intensity, color and orientation) as the perceptual information about object, which are transmitted in "what" pathway. The perception of spatial and motion information related to "where" pathway are presented by depth features and motion features. An integrate-and-fire neural network (IFNN) is employed to simulate the three-way relationship between the two inputs and response to achieve a dynamic and modulatory property. The correlation between two input sets is implemented by the IFNN, which produces a certain amount of gain when two inputs are consistent. In the case that the stimuli from two pathways are correlated, the interspike interval will be shortened. The focus of attention is allocated at the possible target position in the original image after the interspike interval is calculated.According to neural mechanism of visual attention, the approach of attentional control for dynamic visual attention model is developed. An arousal signal is used to measure the strength of the new stimuli in scenes. The arousal signal is defined as parameters for sustained attention judgment. A tracking algorithm is proposed to mimic the sustained attention mechanism in dynamic scenes. If the arousal signal cannot over the threshold, the movement of the focus of attention has to be related to the tracking process. If the arousal signal over the threshold, a method of location enhancement is developed to enhance the saliency of the new stimuli.The computational model of visual attention for dynamic scenes is the target of the research. The theories and approaches are combined to implement the model, which has a strong applicability and important applied value. The experiment results also show that our computational theories and techniques applied in the system are valid and effective.

  • 【网络出版投稿人】 厦门大学
  • 【网络出版年期】2009年 12期
  • 【分类号】TP391.41
  • 【被引频次】30
  • 【下载频次】1981
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络