节点文献

基于生物视觉感知机制的图像理解技术研究

The Research on Technology of Image Nderstanding Based on Biological Isual Perception

【作者】 胡德昆

【导师】 李建平;

【作者基本信息】 电子科技大学 , 计算机应用技术, 2012, 博士

【摘要】 场景理解是计算机视觉中具有挑战性的难点问题,是相关视觉应用的关键环节。动物能迅速地对所处的场景做出判断并响应,准确获取目标对象的位置和类型,这是目前最先进的计算机视觉系统无法媲美的。本文以认知生理学和心理学的研究成果为基础,从图像理解与认知学的相互关系入手,根据动物视觉感知系统中的重要结构和功能机理研究图像理解的关键技术。本文首先深入研究了人类视觉的认知生理学结构和视觉感知机制。视网膜是视觉信息的起始点,主要存在三种细胞获取视野中不同的图像特征信息,通过LGN中的对应通道传送至初级视皮层的V1区域。视觉皮层中的腹侧通路用来形成感受和进行对象识别,分别经历了Vl、V2、V3或V4(中颖叶区)、顶叶皮层(OPC)或下颖叶皮层(IT)的视觉信息传递过程;背部通路处理动作和其它的空间信息;各层次之间存在着前向、水平和反馈的交互作用。因此人类的视觉感知系统不仅具有层次型结构特点,还具有侧抑制和反馈的特性,可以实现快速有效的视觉感知。其次重点研究了基于视皮层感知机制的彩色图像分割模型。提出了一种基于多特征的层次化彩色图像感知分割模型,该方法有效的利用图像的亮度空间分布、细节信息以及颜色空间信息,对图像进行初次分割,并利用BPNN模型对多特征分割结果进行融合选择,得到最终的分割结果。另外,结合Trickle-down视觉理论,研究了结合自底向上和自顶向下的BU&TD彩色图像分割模型,使用特定类特征片段实现了自顶向下的分割,更好的模拟了视觉机制的反馈过程。此外,本文在对现有生物激励目标识别模型进行深入分析的基础上,提出了生物激励的多特征场景分类模型,模型包括两个阶段的处理过程,首先模拟生物低级视觉区域,并行独立的提取图像的三种属性进行场景分类,然后根据三个分割结果进行二次分类,以提高分类的准确性;结合OFC的预测机制和场景上下文信息,研究了基于生物视觉机制的BU&TD目标识别模型,模型在训练阶段建立特定类目标图像的LSF库和GIST特征库,系统自动学习目标的先验知识和上下文信息,在测试阶段,提取输入图像低频特征、上下文特征分别映射到PHC和OFC做出预测,再结合高频细节特征完成目标的识别过程。最后对本文的研究特色进行了描述,对本文的研究工作进行了总结,分析了各模型的实验结果,指出了模型的优点和缺点,并对下一步的工作进行了展望。

【Abstract】 As a great challenge for state-of-the-art systems, scene understanding plays animportant role in computer vision. Animals, however, can quickly arrive at ahypothesis about its main parts so that an appropriate reaction (e.g., escape) isimmediately possible when they are confronted by a visual scene. Based on thecognitive physiology and psychology research results, considering the correlationbetween image understanding and cognitive sciences, according to the structure andmechanism of animal visual perception, some models of image understanding areresearched in this paper.There are ventral and dorsal visual pathways in the human visual system, Objectrecognition in cortex is thought to be mediated by the ventral visual pathway. Threeparallel pathways, within the early stages of visual information processing, wereestablished in the retina and preserved in the lateral geniculate nucleus (LGN). Thethree pathways were then rearranged into three concurrent streams running throughdifferent compartments of area V1and V2. The V1and V2integrate the informationabout the input and output to V3, V4, and inferotemporal cortex, IT. Based onphysiological experiments in monkeys, IT has been postulated to play a central role inobject recognition. IT in turn is a major source of input to PFC involved in linkingperception to memory. The Mapping of computational architecture to visual areas, withlateral competition and feedback, is hierarchy.To segment an object from its background image for advanced vision processing, anovel bio-inspired general framework for image segmentation in complex naturescenes is investigated, which is a hierarchical system that mimics the organization oflayered early visual area in primate visual cortex. The proposed methodology consistsof two typical stages: the first stage is a parallel modular structure including threesegmenting operators based on color feature, form feature and texture feature, each ofwhich solves the segmentation problem independently for the same input. Theyimplement the similar computing as the parvocellular, the magnocellular and koniocellular pathway in LGN from the retina to the primary visual cortex. Then, afusion operation, multiple feature fusion segmentation, integrates these three featuresegmentations together through the backpropagation neuron network in the last stage,which simulates the operation of area following the LGN in primary visual cortex.Another model closely follows the computation of trickle-up and trickle-downprocessing in primate visual pathways. The trickle-down path from the frontal cortex tothe lower level visual areas, predicts incoming stimuli, based on the prior knowledgeof the classes; the computation model of this pathway includes mainly a coveringoperator, which covers the result of the trickle-up with the fragments of specific class.As two important computations in the trickle-down stage, associate method andoptimal method base on Bayesian inference are discussed to improve the performanceof the model also. The proposed approaches is applied to several segmentationexperiments of many single objects in clustering conditions, the result shows that theapproaches are capable of competing with state-of-the-art systems.The early visual area of the animal can perform a great combine function integratingmultiple features of the image to solve the challenges “where” and “what” in the scene.A model for scene image classification is presented in this work; it extends thehierarchical feed-forward model of the visual cortex. Firstly, each of three paths ofclassification uses one image property (i.e. shape, edge or color based features)independently. Then, a single classifier assigns the category of an image based on theprobability distributions of the previous outputs. Experiments show that the modelboosts the classification accuracy over the shape based model. Meanwhile, theproposed approach achieves a high accuracy comparable to other reported methods onpublicly available color image dataset. The second model for object perception mimicsthe computation of trickle-up and trickle-down process in primate visual pathway. Theinformation of high spatial frequency in an image is extracted and optimized to keepthe invariability and selectivity of an object in the trickle-up process. In parallel, thetrickle-down computation is facilitated by the low spatial frequency components topredict the possible objects and most likely context. The object recognition iscompleted by the detailed information through the trickle-up process and thesecontext-and gist-based predictions from trickle-down process. Based on the priorknowledge of the objects and scenes, several recognition experiments demonstrate that the proposed approach is good at object recognition. In addition to its relevance forcomputer vision, the success of this approach suggests a plausibility method for thecombination of forward and backward processes for object perception and sceneidentification in computer vision.Finally, the merits and disadvantages of the models above are analyzed; the futurework is referred in the last of this paper.

节点文献中: