节点文献

基于改进选择性视觉注意模型的语义图像分割研究

Research on Semantic Image Segmentation Based on Improved Svam

【作者】 刘尚旺

【导师】 何东健;

【作者基本信息】 西北农林科技大学 , 农业电气化与自动化, 2012, 博士

【摘要】 自动语义图像分割是计算机视觉领域中一个非常具有挑战性的课题,它是迈向图像理解的关键步骤。然而,用户对图像的理解或检索无法用图像处理算法提取的底层图像特征来完全表达。因此,图像分割与检索等技术面临的严峻问题之一是底层图像特征与高层语义之间存在的巨大语义鸿沟。选择性视觉注意模型(SVAM)是为拟合人类视觉注意机制而提出的可计算模型,它能获得图像中最容易引起人们注意的显著区域,从而能更好地进行语义图像分割。另一方面,作为第3代人工神经网络主要代表的脉冲耦合神经网络(PCNN)在图像分割方面也有良好的性能。为了进一步提高语义图像分割的准确性,本文研究SVAM+PCNN整合模型的自动语义图像分割模型或方法。主要研究内容和结论如下:(1)针对PCNN分割输出为二值图像,而选择性视觉注意模型显著性检测输出为灰度显著图,二者难以直接进行公平比较的问题,采用二者最终彩色图像分割结果灰度化方法,然后对二者的灰度图像分割结果进行受试者工作特征(ROC)曲线分析,来达到二者语义图像分割结果公平比较、分析的目的。实验结果表明,改进ROC分析方法能够有效评价与分析不同类型的图像分割模型或方法。针对整合模型与其组成模型之间性能评价的常用数字化指标差异不大时,缺少评价改进效果或显著性差异的评判标准问题,本文提出把统计学上的均方差指标、双侧Student’s t-test假设检验方法引入模型间的显著性差异分析上。实验结果表明,该指标和方法能够有效评价图像分割模型或方法。(2)针对STB/Itti选择性视觉注意模型提取出的感兴趣区域过小,难以有效进行语义图像分割的问题,提出STB/Itti+PCNN整合模型。该整合模型以STB/Itti模型提取出的颜色与方向特征融合图作为PCNN的输入图像,以增强PCNN的全局耦合功能与抗噪性;用STB/Itti模型的显著图确定出PCNN应点火神经元范围,PCNN无需多次迭代寻找该范围;并且用PCNN取代STB/Itti模型中的WTA神经网络来进行图像分割结果输出。另外,针对缺乏迭代过程的PCNN其图像分割能力会有所下降的问题,用STB/Itti模型的特征融合图输入和局部迭代归一化合并策略来保持或增强PCNN的图像分割能力。实验结果表明,STB/Itti+PCNN整合模型能够有效进行语义图像分割,平均AUC值比STB/Itti模型提高了127.94%,与STB/Itti模型的显著性差异概率在0.99以上,且具有很强的抵抗噪声污染和几何变换攻击的能力。(3)为了科学评价语义图像分割效果,在本研究语义图像分割实验及分析文献的基础上,提出8项最佳语义图像分割标准:1)基于一定的生物视觉机制;2)不需任何已知样本训练与可调参数;3)首先关注的是最大的显著区域;4)平滑地囊括显著目标整体;5)分割出的显著目标具良好的形状特征;6)能抵抗噪声污染和几何变换攻击;7)实时地输出全分辨率图像分割结果图;8)可硬件实现。并用这些标准来指导本研究的语义图像分割模型的设计、建立与实现。(4)为获得最佳语义图像分割结果,本文按照从粗分割到细分割的思想,提出能获得最佳语义图像分割效果的GBVS+PCNN整合模型。在对9种现有SVAM进行视觉效果、性能指标及平均耗时研究对比后,选择GBVS进行语义图像的粗分割,用GBVS提取出的亮度特征图作为PCNN的特殊输入图像,并用该PCNN来扩展GBVS进行语义图像的细分割;最后,用提出的基于“AUC值大小”判决准则的显著区域判别算法,自动完成最终语义图像分割结果的输出。实验结果表明,GBVS+PCNN整合模型能满足前7项最佳语义图像分割标准;该整合模型的PCNN扩展部分则满足全部最佳语义图像分割标准。运用双侧Student’s t-test方法,得出该整合模型与GBVS之间的显著性差异概率在0.99以上。(5)针对PQFT模型提取出的显著图中冗余低频信息过多,导致不能很好界定显著目标物位置的问题,本文通过PCNN模型自动设置的链接系数来确定出每个像素的周边区域,提出一种拟合生物视觉神经元的中央兴奋-周边抑制机制的精简C-S运算方法,用该运算方法计算出的CIE Lab颜色空间上三通道差值图像作为PQFT的4元数输入图像的虚部系数,从而得到改进PQFT模型:IPQFT,以有效减少显著图中的冗余低频信息。另外,针对基于“AUC值大小”判决准则的显著区域自动判别算法在其整合模型中的平均耗时占到5%过多的问题,提出与其功能相似的基于“尺寸变化与否”判决准则的显著区域判别算法,在MATLBAB环境下使显著区域判别算法对测试图像的平均耗时从102.0ms降到了16.1ms。(6)针对可硬件实现的实时语义图像分割实际应用需求,提出IPQFT+PCNN整合模型。用IPQFT进行语义图像的粗分割;用PCNN对IPQFT扩展并进行语义图像的细分割;用基于“尺寸变化与否”判决准则的显著区域判别算法来自动完成最终语义图像分割。实验结果表明,在MATLBAB环境下IPQFT+PCNN整合模型处理一幅测试图像的平均耗时为238.2ms,达到了实时性的要求;由于IPQFT中的主要算法傅里叶变换及PCNN均可硬件实现,二者的整合模型也便于硬件实现;另外,该整合模型具有抗噪性、几何不变性等健壮的鲁棒性,同时还具有并行、自动、智能等特点。(7)针对语义图像分割模型或方法的性能数字化指标综合评价方法尚不完善问题,遵照最佳语义图像分割标准,提出一种不同类型SVAM的综合评分决策表的方法,以丰富SVAM的评价方法与指标体系。从综合得分情况看,本文提出的3种整合模型在语义图像分割方面均优于本研究中现有的9种SVAM,能够显著提高语义图像分割的准确性。

【Abstract】 Automatic semantic image segmentation is a challenge task in the computer vision field,which is the key step to image understanding. However, users cannot understand or retrievean image by using raw image features extracted by image processing algorithms. Therefore,one of serious problems confronted by image segmentation and retrieval is the huge semanticgap between raw image features and image semantic acquisition.Selective visual attention model (SVAM) is the computational model provided formimicking the attention mechanism of human visual system, which can obtain the mostsalient region that attracts people greatly in an image. So, SVAM is able to implementsemantic image segmentation efficiently. On the other hand, as the main representative in thethird generation of artificial neural network, pulse-coupled neural network (PCNN) has thegood performance in image segmentation. In order to further improve the accuracy ofsemantic image segmentation, this dissertation devotes itself to automatic semantic imagesegmentation models or methods based on integration models of SVAM+PCNN. The majorresearch contents and conclusions of this dissertation are summarized as follows.(1) The output of the PCNN is a binary image, while the salient region detection result ofthe SVAM is a gray-scale image. Then, it is difficult to compare their performance of imagesegmentation directly and fairly. To solve this problem, this paper improves the receiveroperating characteristic (ROC) curve analysis method based on the same platform of theirgray scale image results transformed from their color image segmentation results.Experimental results show that the improved ROC analysis could evaluate different kinds ofimage segmentation models or methods efficiently. Another problem is how to prove that anintegration model is significantly different from its component models. To solve this problem,this paper introduces the mean square deviation in statistics and the statistic method ofStudent’s t-test to the evaluation of different models. Experimental results indicate that thisindex and method can efficiently evaluate image segmentation models or methods.(2) Considering that the region of interest (ROI) extracted by saliency toolbox (STB)/Ittimodel is not large enough for semantic image segmentation, the integration model ofSTB/Itti+PCNN is proposed. This integration model took fusion image of color andorientation maps extracted by the STB/Itti model as the input image of PCNN, so that the strong image segmentation capability and the property of anti-noise of PCNN could beenhanced; Saliency map generated by STB/Itti was employed to identify the optimal iterationnumber of PCNN at once, and the PCNN did not require many iterations; the PCNN thendisplaced the WTA of STB/Itti to output the semantic image segmentation results.Furhermore, the image segmentation capability of the PCNN lacking iterations will be weak,so two strategies were adopted in order to keep its strong image segmentation capability:Special input image mentioned above was chosen to enhance the global coupled modulationfunction; The feature combination of “iterative local localized interactions” in STB/Itti wasadopted to assist the PCNN in performing the function of pulse synchronization.Experimental results show that the integration model of STB/Itti+PCNN can efficientlysegment images semantically, with mean AUC value increased by127.94%; there issignificant difference at above0.99probability level between STB/Itti+PCNN model andSTB/Itti model, and STB/Itti+PCNN model is robust against noises and geometric attacks.(3) To evaluate the semantic image segmentation results scientifically, eight criteria ofthe best semantic image segmentation are proposed as follows based on the results of thisresearch and some related literatures.1) Based on biological visual mechanism to some extent;2) Does not require any training, trials or tunable parameters;3) Emphasize on the largestsalient object;4) Uniformly highlight the whole salient regions;5) Establish well-definedboundaries of salient objects;6) Disregard high frequencies arising from noise and geometricattacks;7) Efficiently output full resolution image segmentation results;8) Can beimplemented by hardware easily. Meanwhile, we use these eight criteria to guide the design,building and implementation of the semantic image segmentation models in our research.(4) To obtain the best semantic image segmentation results, the GBVS+PCNN model isproposed according to the theory from coarse segmentation to fine segmentation. Afterresearching on and comparing with visual effects, performance indices and averagetime-consuming of nine current SVAMs, the GBVS model was chosen to implement thecoarse segmentation, and the intensity feature map extracted by GBVS was used as the inputimage of PCNN which extended GBVS to implement the fine segmentation; Finally, a salientregion identified algorithm based on “AUC Value” was proposed for automatic output of thefinal semantic image segmentation results. Experimental results show that the GBVS+PCNNmodel can meet seven of the eight criteria of the best semantic image segmentation, while thePCNN part of this integration model can conform to all of the eight criteria. By using pairedStudent’s t-test method, we have got the probability level of significant difference betweenGBVS+PCNN model and GBVS model, standing at above0.99.(5) Saliency map generated by PQFT model cannot locate the salient object of an imagewell, because there is too much redundant low-frequency information. To mimic the center excited-surrounding inhibited mechanism of biological visual neurons, a simplified C-Salgorithm is proposed based on the surrounding region identified by the linking coefficient setautomatically by PCNN. Three subtraction maps of CIE Lab color space channels calculatedby the C-S algorithm were taken as the coefficients of imaginary part of PQFT quaternionimage, through which the improved PQFT, named IPQFT, is proposed. IPQFT could greatlyreduce the redundant low-frequency information in its saliency map. Additionaly, to solve theproblem that the algorithm for automatically identifying the salient region based on “AUCValue” costs5%of average time-consuming in its integration model, a similar algorithmbased on “Size Change” is proposed, which makes the average time-consuming decreasedfrom102.0ms to only16.1ms in MATLAB environment.(6) To meet the practical application requirements of real-time semantic imagesegmentation method which can be implemented by hardware easily, the integration model ofIPQFT+PCNN is proposed. Just like GBVS+PCNN model, IPQFT was applied for coarsesegmentation; the PCNN extended IPQFT and was used for fine segmentation; the algorithmfor identifying the salient region based on “Size Change” was applied to output automaticallythe final semantic image segmentation results. Experimental results show that the averagetime consumed by IPQFT+PCNN model to process a testing image is about238.2ms inMATLAB environment, which achieves the real-time requirement; because the Fouriertransform and PCNN algorithms of the integration model can all be implemented by hardware,then the IPQFT+PCNN model can also be implemented by integrated circuit easily. Moreover,the IPQFT+PCNN model is robust to noises and geometric attacks, and it is parallel,automatic, and intelligent as well.(7) Comprehensive evaluation on performance indices of semantic image segmentationmodels or methods is still not perfect. According to the eight criteria of the best semanticimage segmentation, a total score of decision table is proposed to evaluate different kinds ofimage segmentation models or methods, which enriches the current evaluation methods orindices of SVAMs. According to the total scores of the decision table, the three integrationmodels proposed by this dissertation outperform the other nine current SVAMs in semanticimage segmentation, and can output semantic image segmentation results more accurately.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络