节点文献

基于模式识别的双目立体视觉匹配研究

Binocular Stereo Matching Research Based on Pattern Recognition

【作者】 彭祺

【导师】 仲思东;

【作者基本信息】 武汉大学 , 信号与信息处理, 2013, 博士

【摘要】 随着科技的飞速发展,机器视觉相关应用在国民生活中扮演越来越多、越来越重要的角色,例如安防监控、机器人导航、三维数字虚拟现实等。人们对机器视觉的要求也越来越高,现阶段许多机器视觉应用正从二维可视化向三维可视化发展。双目立体视觉系统作为最接近于人类视觉结构特征的三维图像系统,是现阶段机器视觉的研究热点。双目立体视觉系统模拟人类视觉系统,通过左右两幅二维图像恢复出三维图形,立体匹配技术是该过程中十分重要,且十分困难的一个关键步骤。立体匹配的本质是参照某幅图像,在另一幅图像中搜索对应同名点。自然世界的图像十分复杂,某些特殊场景图像甚至让最高智能的人类视觉系统产生错觉,要求计算机对各种复杂图像正确分析,搜索到相对应的同名点是一项极具挑战的工作。本文的工作即围绕双目立体匹配这一任务展开。有关双目立体匹配的研究已进行数十年,从最初基于极限约束的特征搜索,到后来马尔科夫随机场优化理论的应用,再到图像分割算法的应用,到现阶段最新的并行算法引入,双目立体匹配的研究遵照Marr提出的视觉理论框架从底层处理向高层图像理解进发。图像弱纹理区域、遮挡区域的匹配一直以来是双目立体匹配的难点。双目立体匹配技术若要应用于生产生活的各个方面,必须能处理各种复杂的自然场景,而且要保证一定的计算效率。传统的图像处理方法在这些问题的处理上已遭遇瓶颈,要想获得新的突破,必须向高层的图像理解进发,运用人工智能模式识别的方式进行立体匹配。将模式识别有关理论与技术方法引入双目立体匹配,是本文所做研究工作的重点。本文主要的研究工作和创新成果如下:(1)本文在第一章对现阶段立体匹配领域的5类主流算法分别进行比较和讨论,对现阶段立体匹配算法所面临的问题与难点进行了讨论。在第二章介绍了双目立体视觉的原理和立体匹配算法的分类,并详细讨论了模式识别中的部分理论与双目立体视觉匹配的关系。(2)近几年图像采集设备的发展十分迅猛,人们能轻而易举地获得高清晰度的图像和视频,但这些高分辨图像给计算机处理带来巨大挑战。几年前,基于MRF理论的部分立体匹配算法,计算一张450x375分辨率的图像耗时十几分钟,而现在普通相机获取的图像分辨率高达6000×4000像素,依照传统方法对该立体像对进行匹配,计算时间过长,失去实用价值。本文第三章围绕该问题进行深入研究,提出一种基于仿射不变收敛三角形的约束方法进行立体匹配。该方法对于室外大场景具有较高精度,对于视差变化剧烈的复杂场景,通过点群分组的方法对区域进行识别匹配,对遮挡处理具有良好的效果。该算法基于仿射不变几何约束,与像素点自身属性无关,因此计算速度很快,在毫秒量级。(3)高可靠性的匹配点对,即广义地面控制点GGCP,在立体匹配过程中发挥十分重要的作用。如何自动获得高可靠性的GGCP点,成为本文第四章第一节的研究内容。在该章节中,提出基于不同深度面元特征点K均值聚类算法,对初始获取的SURF特征匹配点对进行筛选剔除,保留符合条件约束的特征点对。该算法相对于传统特征点匹配算法,充分考虑空间对象目标物深度约束的特点,鲁棒性地对二维图像中的点进行聚类,获取高可靠性的GGCP点。(4)对于弱纹理图像,基于图像分割的立体匹配算法有许多其它算法所不具备的优良特性,但基于图像分割的立体匹配算法严重依赖分割效果,且在后续匹配步骤中很难修正图像分割算法带来的误差。对于该问题,本文第四章第二节进行详细的研究,引入模式识别中的模糊理论,将单个“无意义”的像素点归属到语义较为丰富的线基元和面基元当中。对整个面基元进行区域匹配,能有效克服图像弱纹理区域带来的不利影响,特别有益的是,线基元和面基元在后续的匹配计算中还可以进行修正,避免基元提取和匹配误差造成后续匹配步骤难以修正的错误。(5)稠密立体匹配一般需要较长的运算时间,经研究发现造成运算时间较长的原因是对每个像素进行视差轮询所致。本文第四章第三节运用句法模式识别的思想,将复杂场景图像构建为层次化的模式关系,运用多维矩阵表达复杂的数据结构,提出点线跳跃视差传递算法,并引入动态规划理论分步生成视差图。该方法相对于传统方法具有一定的“智能”,它首先将复杂场景识别为语义信息丰富的基元,然后依据基元特征进行匹配。该方法运算速度较快,对不同场景适应性较好。(6)运动目标的阴影识别与剔除,一直以来是运动目标跟踪与定位领域的难点。本文第五章,运用双目立体视觉系统及其理论,分别对室外与室内的运动目标投射阴影进行剔除,获得良好的效果。本文提出的所有匹配算法均遵照人工智能模式识别的思想,对运算速度严格要求,充分保证所提出的算法具有实用性。

【Abstract】 Along with the rapid development of science and technology, applications related to machine vision play more and more important roles in the national life. Such as security guard monitoring, robot navigation, three-dimensional digital virtual reality and so on. At the same time, people put forward higher demand for machine vision. At the present stage, a large number of machine vision applications moved from two-dimensional to three-dimensional visualization techniques. As the most close to human visual characteristic three-dimensional system, binocular stereo vision system is a research hotspot in machine vision. Binocular stereo vision system can simulate human vision system. It restored the three-dimensional graphics by their left and right images. Stereo matching technology is a very important and very difficult critical process. The essence of stereo matching is to consult one image, then search its corresponding homonymous points in another image. The image of the natural world is very complex. Some images of special scenes even cause human who has the highest intelligence visual system to produce an illusion. It is a very challenging task to require the computer to analyze various complex images correctly and to search corresponding homonymous points. The work of this article is around the binocular stereo matching task.Studies of binocular stereo matching has been for decades, from the initial feature search based on limit constraint, then the application of optimization theory in Markov random field, to the application of image segmentation algorithm, at last to present the latest parallel algorithm is introduced. The research of the binocular stereo matching follows the visual theoretical framework introduced by Marr that from the bottom processing to the top image understanding. The weak texture and occlusion areas in the image are the difficult questions in binocular stereo matching all the time. To make the binocular stereo matching technology to be used in production and all aspects of life, it must be able to handle a variety of complex natural scenes and guarantee the computational efficiency. The traditional image processing methods had met bottleneck to deal with these problems. In order to get a new breakthrough, it must go to the top of image understanding. The artificial intelligence and pattern recognition methods should be used in stereo matching. The theories and methods in pattern recognition area can be introduced into binocular stereo matching, and that is the focus of the research work done in this article.The main research work and innovative results of this article are as follows: (1) Chapter one in this article compares and discusses the current five kinds of mainstream algorithms in stereo matching fields respectively. The problems and difficulties existing in the current stereo matching algorithms are also discussed. Chapter two introduces the classification for the binocular stereo vision principle and stereo matching algorithm. It also discussed the relationship between some theory in pattern recognition and the matching in binocular stereo vision in detail.(2) The development of image acquisition device is very rapid in recent years. As a result, people can get high resolution images and videos easily. But those high resolution images bring great challenge to computer processing. Several years ago, some of stereo matching algorithms based on Markov theory took more than ten minutes to calculate a450x375resolution image. In nowadays, an ordinary camera can obtain the image resolution of up to6000x4000pixels. The calculation time is too long to lose practical value by traditional stereo matching algorithms. Chapter three in this article make a depth research around this issue and propose a stereo matching method based on the constraint by affine invariant triangle convergence. This method can gain high accuracy matching points in outdoor scene. For some complex scene whose disparity change drastically, the areas can be identified and matched by the point grouping. It can handle the occlusion problem effectively. The method is based on the constraint by affine invariant geometric and has nothing to do with the pixel itself attribute. So it has very high speed to the order of millisecond.(3) The high reliable matching points also can be called generalized ground control points (GGCP), play a very important role in stereo matching. How to get the high reliable GGCP automatically is the research content in the first section in chapter four. In that chapter, a K-means clustering algorithm based on area element in different depth has been proposed. It can be used to screen and eliminate the initial SURF matched feature points and preserve matched feature points conformed to the conditions. Compared with the traditional feature points matching algorithm, this method take a full account of the characteristics of target depth constraint to the space object and cluster two-dimensional points robustly. It can be used to obtain high reliability GGCP.(4) Compared with other algorithm, the stereo matching algorithm based on image segmentation has many good qualities for weak texture images. But the algorithm depends on the segmentation effect greatly and the error brought by the segment method is very difficult to be corrected in the following matching step. For this issue, the second section in the fourth chapter carries on the detailed research. The fuzzy theory in pattern recognition has been introduced, and the single "meaningless" pixel can be attributed to some relative rich semantic line and area elements. The area matching can be used in a whole area element and it can overcome the adverse effects by the weak texture area in the image. What’s more, the line and area elements can be corrected in the latter matching step and avoid some errors difficult to be corrected caused by element extract and matching.(5) Dense stereo matching usually take a long time. It found that the reason for the long operation time is caused by parallax polling disparity for every pixel. The third section in the fourth chapter makes use of the idea of syntax pattern recognition, and the hierarchical relationship model is built in the complex scene image. The complex data structure is expressed by multidimensional matrix. Point and line jump disparity transmission algorithm is proposed, and the dynamic programming theory is also introduced to create disparity map step by step. Compared with the traditional method, it has some intelligence. At first, the complex scene is recognized as element rich of semantic information, then matching them by their features. This method has high computation speed and good adaptability to different scenes.(6) The shadow recognition and elimination for the moving object is always difficult in moving object tracking and position. The chapter five makes use of binocular stereo system and its theory to eliminate the cast shadow for the moving object outdoor and indoor separately and a good effect is obtained.All of the matching algorithms proposed in this article follow the idea of artificial intelligence and pattern recognition. It has very strict requirement in computing speed in order to ensure that the proposed algorithm is practical.

  • 【网络出版投稿人】 武汉大学
  • 【网络出版年期】2014年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络