节点文献

基于概率图模型的场景理解方法研究

Research on Scene Understanding Methods Based on Probabilistic Graphical Models

【作者】 毛凌

【导师】 解梅;

【作者基本信息】 电子科技大学 , 信号与信息处理, 2013, 博士

【摘要】 场景理解作为计算机视觉研究领域中极其重要的基础问题和终极目标,其研究成果已广泛应用于机器人导航、安防、医疗、网络搜索等众多民生领域,彰显出重要的学术研究价值和现实意义。围绕“分而治之”的指导思想,场景理解的各分支任务,如目标检测、图像分割、场景分类等都已取得了突破性进展。但是整体场景理解的目标远未实现。近些年围绕“合而为一”的指导思想,学者们提出了“语义分割”的研究思路,研究如何将这些分支任务融为一体,以实现场景理解的最终目标,并据此提出“联合目标检测和语义分割”。语义分割不仅在一定程度上实现了对视觉场景的理解,更是推理出其他高层语义的基础;联合目标检测和语义分割则是在完成语义分割的同时,定位到每个物体并获得目标的数量信息。但是目前已有研究成果并不令人满意。因此,本文着眼于目标检测,语义分割,联合目标检测和语义分割等研究热点和难点,采用概率图模型,针对已有研究中的不足开展研究并提出了相应的解决方法。本文主要内容和贡献如下:1.研究了如何构建先进的条件随机场模型,使其准确反映现实视觉场景中的约束条件,从而提升语义分割性能。提出了三种模型:(1)基于扩充纹元图的点对条件随机场模型(下称模型I)。该模型由一元项和成对项组成,其中一元项由联合自举分类器构成,成对项反映了相邻像素间的平滑约束。该模型表达形式简单,简化了模型参数的学习过程。为更好地描述纹理特征,利用LBP、SIFT和Color SIFT等局部特征描述子扩充了原始纹元图;为获得更具区分力的特征表达,在扩充纹元图的基础上定义了纹理空间滤波器,引入了形状、位置和上下文信息,并将其作为联合自举分类器的弱分类器。实验结果表明,该模型得到了较好的语义分割效果。(2)基于全局同主题约束的高阶条件随机场模型(下称模型II)。为了克服模型I自身的局限性,引入了反映全局同主题约束的高阶项,构建出高阶条件随机场模型。首先采用规范化分割对输入图像进行多次分割,其次利用主题模型发现同主题分割块,然后在同主题分割块上定义高阶项,最后与模型I加权混合得到高阶条件随机场模型。该模型不仅考虑了局部纹理特征对于像素类别的约束,而且反映了同主题分割块类别一致性的全局约束,在实验中取得了良好的语义分割效果。(3)融合了像素和分割块两种基本处理单元的分层条件随机场模型。该模型由观察数据层、像素层、分割层三层组成。观察数据层即原始图像;以像素作为基本处理单元的模型I构成像素层,反映了局部纹理特征对于像素类别的约束以及像素间平滑约束;以分割块作为基本处理单元的模型I构成分割层,反映了分割区域的描述特征对于分割块类别的约束、区域一致性约束、以及分割块间平滑性约束。该模型在分割块和块内像素上定义了关联能量项,对两者进行了融合,克服了单独使用一种处理单元的缺陷。本文分别采用了基于多分割图模式和基于约束参数最小割两种方式来获得分割层。此外,本文还提出了一种新的一二阶合并方法来获得更为稳定可靠的分割区域的描述特征。2.提出了一种基于偏最小二乘分析的目标检测方法。首先对输入图像进行多尺度滑窗搜索,通过密集采样获得滑窗的高维特征描述。其次利用偏最小二乘方法从原始高维特征中抽取出少量潜在成分组成低维特征向量空间,从而得到新的目标特征表达。接着提出了一种利用模型质量比值确定最佳潜在成分数量的方法。最后利用基于高斯核的均值漂移算法进行最大值抑制,去除重叠检测边界框,得到最终的目标检测结果。实验结果表明:降维性能优于PCA,能够获得更具区分力的低维特征表达;目标检测性能优于Dalal提出的经典算法。3.提出了一种新的高阶条件随机场模型,以解决联合目标检测与语义分割问题。基本思想是:在模型II的基础上,引入目标检测高阶能量项,将基于目标检测器对搜索窗内像素类别的判断作为一种约束条件反映到能量方程中,与局部纹理特征、像素间平滑先验、分割块内像素类别一致性等约束条件一起“竞争”,共同决定像素的类别归属。此外,提出了两种目标检测能量项生成方法:一是直接利用目标检测器的检测结果生成能量项;二是同时提取边界框中的全局形状特征和局部纹理特征,并通过特征的一二阶合并方法获得更具鲁棒性的特征表达,再利用逻辑斯蒂回归分类器获得更准确的检测信任度,进而获取目标检测能量项。实验结果表明,该模型能够同时完成目标检测和语义分割任务,并且提升了语义分割性能,优于目前许多语义分割算法。

【Abstract】 Scene understanding, as an important basic problem and ultimate goal in computervision, has been widely applied in many fields, such as, robot navigation, security,medical treatment and web search. According to the idea of “Divide and Conquer”, eachbranch of scene understanding, including object detection, image segmentation andscene classification, has made a breakthrough. However, the overall sceneunderstanding is far from achieving. In recent years, according to the idea of “Mergethese subtasks”, scholars have put forward the concepts of semantic segmentation, andlater joint object detection and semantic segmentation so as to realize the ultimate goalof scene understanding. In some sense, scene understanding can be formulated assemantic segmentation, and besides, other high-level semantic information be obtainedfrom it. Joint object detection and semantic segmentation can localize each object andprovide the number of objects, and besides, achieve semantic segmentation. However,current research results are not satisfactory. This dissertation focuses on the researchhotspots and difficulties, including object detection, semantic segmentation, joint objectdetection and semantic segmentation. In order to overcome the shortcomings in theexisting methods, this dissertation proposes some solutions based on probabilisticgraphical models. In this dissertation, the main contributions are described below:1. This dissertation focuses on the way to build advanced conditional random fieldmodels, which can accurately reflect real constraints in the visual scene and thusimprove the semantic segmentation performance. This dissertation puts forward threemodels:(1) Pairwise conditional random field model based on the enhanced texton map.This model is composed of unary item and pairwise item (model I). The unary item isconstructed by jointboost classifier, and the pairwise item reflects the smoothnessconstraint between adjacent pixels. The model is simple and thus simplifies the learningprocess of the model parameters. To describe the texture characteristics better, LBP,SIFT and Color SIFT are used to enhance the original texton map; on the other hand, toobtain more discriminative features, the texton-layout filter is defined on the enhanced texton map, and is used as the weak classifier of jointboost, which introduces the shape,location and context information. The experimental results show that the modelachieves better semantic segmentation performance.(2) Higher-order conditional random field model based on the global same topicconstraint (model II). In order to overcome the limitations of model I, higher-order itemis introduced to build up higher-order conditional random field model, which reflectsthe global same topic constraint. Firstly, normalized cuts segmentation is performedseveral times; secondly, the same topic segments are found by using topic model; andthen the higher-order item is defined on the same topic segments; finally, thehigher-order item and the model I are combined to achieve higher-order conditionalrandom field model. This model not only considers the local texture feature constraintfor pixel categories, but also reflects the consistency of the same topic segments’category. Good semantic segmentation results are obtained in the experiments.(3) Hierarchical conditional random field model fusing both of the basic processingunits, i.e. pixel and segment. This model is composed of observation data layer, pixellayer and segmentation layer. Observation data layer is the original image; the model Ibased on pixels constitutes the pixel layer, which reflects the local texture constraint forpixel categories and smoothness constraint between neighbouring pixels; the model Ibased on segments constitutes the segmentation layer, which reflects the featureconstraint extracted from segments for segment categories, region consistencyconstraint and smoothness constraint between neighbouring segments. The associatedenergy term is defined on segments and pixels within them, and thus fuses both of thebasic units, that overcomes the defect of using only a processing unit. This articleseparately adopts two methods to generate the segmentation layer, i.e. multiplesegmentation mode and constrained parametric min-cuts. In addition, this dissertationpresents a new first-second-order pool method to describe the segmentation area morestably and reliably.2. This dissertation proposes an object detection method based on partial leastsquares analysis. Firstly, multi-scale sliding window searching is performed, and thehigh-dimensional feature description is obtained through intensive sampling. Secondly,the partial least squares method is used to extract out a few of latent components fromthe original high-dimensional features, which constitute low-dimensional feature space. In this dissertation, quality ratio is used to determine the best number of latentcomponents. Finally, the mean shift with Gaussian kernel is used to perform nonmaximum suppression, which removes overlapping bounding boxes, and gets the finaldetection result. The experiment results show that, the method is better than PCA inreducing dimentions, and gets more discriminative low-dimensional feature expression,and obtains better results than Dalal’s algorithm.3. This dissertation proposes a new higher-order conditional random field model tosolve the problem of joint object detection and semantic segmentation. Its basic idea is:on the basis of the model II, we define the object detection higher-order energy item,which introduces the results obtained by the object detector into the energy equation, asa kind of constraint. This constraint competes with other constraints, e.g. local texturefeature, smoothing prior between pixels, region consistency constraint, to jointlydetermine the category of pixels. Additionally, this dissertation puts forward two kindsof methods to generate detection energy term: one is to directly use results generated byobject detector, the other is to extract the global shape characteristics and the localtexture features from the bounding box at the same time, and obtain more robustexpression of these characteristics through first-second-order pooling, and then computethe detection energy item based on output of logistic regression classifier. Theexperimental results show that the model can complete both the object detection andsemantic segmentation tasks simultaneously. Moreover, it shows superior to manycurrent semantic segmentation algorithms.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络