节点文献

视觉注意模型及其在目标感知中的应用研究

Research of Visual Attention Model and Its Application on Biologically Inspired Object Recognition

【作者】 肖洁

【导师】 丁明跃;

【作者基本信息】 华中科技大学 , 模式识别与智能系统, 2010, 博士

【摘要】 人类初级视觉系统根据当前相关的行为和视觉任务,使用注意机制来处理重要信息。通过这种处理方式,可以有效地平衡计算资源、减少时间消耗以及解决复杂场景下不同视觉任务问题。在计算机处理复杂场景信息的过程中,应用视觉注意机制可以把有限的计算能力更加有效地分配给重要的处理任务。视觉注意计算模型一般使用两种信息引导注意力的转移:自底向上基于图像显著性的信息和自顶向下基于任务的信息。如何有效地利用这两种信息指导注意力迅速关注到兴趣目标区域,为进一步的目标识别奠定基础,具有十分重要的意义。本论文运用神经科学、模式识别和图像处理理论,深入分析了生物视觉信息处理过程的相关内容,进行了计算机视觉注意机制的研究,并将其应用在目标搜索和识别上去。本论文完成的主要工作如下:研究了视觉关注区域提取方法。结合基于显著度的区域选择方法和尺度空间主结构方法提取视觉关注区域。对于一幅输入的彩色图像,根据数据驱动注意模型找到显著点,使用基于显著度的区域选择方法得到显著区域。然后,将彩色图像转化为灰度图像,使用尺度空间主结构方法获得局部极值点坐标和对应尺度。在已求得的显著区域内,寻找最大响应极值点,并在相应尺度上确定图斑区域。最后合并这两个空间区域,获得包含目标的区域。这种分割结果相对粗糙,给出的不是严格的目标边界,但是可以有效地覆盖目标,减少数据冗余。研究了基于对象积累的视觉注意模型。图斑是存在于尺度空间中目标重要结构的反映,利用图斑引导感知分组过程,可以使注意力更好地关注于任务相关的区域。通过引入多尺度图斑,模型能够有效关联高层语义(先验知识)和底层特征,并基于图斑特征建立先验知识的表达形式。对于给定新的场景,模型首先通过视觉预注意阶段计算得到中间数据,提取图斑特征。然后使用事先建立的基于图斑特征的先验知识,迅速有效地引导视觉注意力关注任务相关区域。最后利用对象积累机制合并图斑区域,实现感知分组,提取完整目标区域。模型很好地利用了自顶向下和自底向上的信息。实验将新模型和显著区域提取模型及波谱残留模型进行比较,证明了本论文所提出模型的优越性。研究了基于对象积累视觉注意机制的目标搜索和识别模型。本论文提出了一种基于对象积累机制的目标自动学习方法,在图斑引导下使用对象积累机制获得目标积累过程中的能量变化趋势,形成目标表达向量。同时,提出了一种基于对象积累机制的目标搜索和识别方法,将目标表达向量作为自顶向下的先验知识,与来源于图像的自底向上的底层信息结合起来,利用图斑特征引导注意力转移,迭代积累对象,提取完整目标区域,并提供初步识别结果。实验中,新模型对200幅图像中的40个不同目标对象进行学习和识别,获得了88.5%的识别率,证明了本论文所提出模型的有效性。最后研究了基于SIFT算子评估视觉关注区域有效性的方法。目前,计算机视觉注意常用的计算模型仍然存在很多问题:一方面模型无法充分利用自底向上的图像信息和预处理过程中产生的中间数据,实际计算效率与生物视觉系统的感知效率仍然存在一定的差距:另一方面模型引入自顶向下先验知识的方式、方法还有待进一步改进。产生的直接后果就是提取到的关注区域不能够合理、完全地覆盖目标。就目标识别而言,完整地提取目标区域,约减冗余数据十分关键。为了比较不同视觉注意模型提取得到的关注区域的有效性,判断其对目标识别结果的影响,本论文基于SIFT目标识别算法,提出了一种新颖的评估方法,可以获得较为客观的评估结果,避免人的主观评价而产生误差。

【Abstract】 The primate visual system employs an attention mechanism to limit processing to important information that is currently relevant to behaviors or visual tasks. It can efficiently deal with the balance between computing resources, time cost and performing different visual tasks in a normal, cluttered and dynamic environment. The application of visual attention mechanism in computational model can assign the finite computation resources to more important tasks. There exist two ways by which information can be used to direct attention, bottom-up, image-based saliency cues and top-down, task-dependent guidance cues. How to use the two kind of cues efficiently, guide attention to target-relevant regions promptly and serve for object recognition perfectly, is of great significance. Based on the theory on neuroscience, pattern recognition and image processing, the biological visual attention procedures are deeply analyzed. The visual attention mechanism and its application on object search and recognition are developed. In summary, the following main works have been accomplished in this dissertation.Development of a new approach of visual attended region extraction. A new model of region extraction is proposed based on saliency-based region selection and scale-space primal sketch. For a input color image, the extent of object is estimated by means of saliency-based region selection, which considers feature that contributes most to the saliency map in bottom-up visual attention model. After that, the color image is changed into gray-level image and the local maxima on each scale are computed. The blob of largest response is picked, which is in the same area with the salient region obtained from the previous step, and then these spatial regions are combined together. The segmentation obtained is coarse in the sense that the localization of object boundaries may not be rigid. However, the segmentation is safe in the manner that those regions can be served as attended regions which extremely reduce the data redundancy.Development of a new visual attention model based on object-accumulation visual mechanism. From the research on visual attended region extraction, blobs can be reckoned as the reflection of important structure in scale-space. Therefore, the information of blob feature can guide perceptual grouping and lead the attention to task-relevant regions. By introducing multi-level blobs and connecting blob properties and low-level features in our model, the knowledge representations for prior information can be built by blob features. For any new given scene, the proposed model can use the prior knowledge to render th object more salient by enhancing their features which are characteristic of the object, then recursively group regions together to form objects, guided by blob features extracted from the intermediate data computed at pre-attention stage. Selective visual attention in the proposed model can be effectively directed to task-relevant regions. The comparison of the proposed model against other attention models proved its superiority.Development of a model for object searching and recognition based on object-accumulation visual attention mechanism. For the effective description of object and forming the top-down information, an automatic object learning approach based on object-accumulation mechanism is proposed. The approach can reuse the data in current visual attention framework to represent target object, produce accumulation strategy, and output the object representation vector. Accordingly an object search and recognition approach based on object-accumulation mechanism is proposed. The object representation vector served as top-down information can be combined with bottom-up information from image. Taking into account blob feature extracted from multi-scale set of low-level feature maps, the model recursively combines regions to form objects, promptly guide the attention to search relevant object, fully extract object region, and provide primary recognition result. The proposed model acquired 88.5%recognition rate which proved its efficiency.Development of a novel method for evaluating how well the attended regions contribute to the recognition of the target based on sift algorithm. At present, there are some problems existed in visual attention model. In one hand, the model cannot fully utilize the bottom-up information in image and intermediate data produced in the pre-attention stage. As for complex scene, there exists huge gap between the computation efficiency of the computional model and the perception efficiency of the biological visual system. In the other hand, the way how to introduce the prior information is not plausible, the general model for good adaptation cannot be acquired. The consequence is that the attended region extracted by computational model cannot have a comprehensive coverage of the target. Better coverage of target region for attention can better serve for recognition. Based on SIFT recognition algorithm, a novel evaluation approach is proposed to achieve an objective validity description instead of judging by people subjectively in previous works. Firstly, SIFT features have been extracted from a reference image and stored in a database as object learning in advance. For attended regions extracted by different visual attention models, the algorithm computes the SIFT features in the region and compares them with the keypoints stored for each object in the database. Secondly, a formula is defined to compute the validity of the attended region based on the accuracy of the fit of the SIFT keypoints and probable number of region SIFT keypoints. The comparison of the proposed model with classic evaluation criterion recall-precision demonstrated its superiority.

节点文献中: