节点文献

面向服务机器人的室内语义地图构建的研究

Semantic Mapping for Domestic Service Robots

【作者】 王锋

【导师】 陈小平;

【作者基本信息】 中国科学技术大学 , 计算机应用技术, 2014, 博士

【摘要】 机器人对其所处环境的自主感知与理解是机器人与人工智能领域的一项长远目标。近年来,随着室内服务机器人研究的不断进展,以及新型深度传感器的相继问世,基于RGB-D摄像头的室内三维地图构建受到越来越广泛的研究与关注。特别地,诸多相关领域(如机器人、计算机视觉和计算机图形学,等)的研究成果被用来对齐与融合RGB-D图像序列,进而获得全局的三维场景表示。在地图构建的精度、尺度、速度,以及场景模型等诸多方面已有了大量深入的研究和富有价值的成果。针对室内移动机器人对三维地图模型的需求,本文从自主性的角度出发,,系统性地研究了RGB-D地图构建方法,进而构建出一条从传感器硬件到高层语义模型自动化信息处理链。此处的自主性主要包括为使用机器人自动地构建场景三维地图,以及对其进行分析、理解以产生语义模型。基于RGB-D摄像头来自主获取室内环境的三维语义模型,属于一种低成本环境建模技术。对该技术的研究具有广泛的应用价值和重要的商业价值,例如由此衍生出的纯RGB-D摄像头导航,将有可能替代当前主流但较昂贵的激光导航。本文从以下三个方面来考虑如何实现这种自主性:1)机器人需要具有自主扫描场景来采集RGB-D图像的能力。如何在有限的先验信息上自动规划摄像头在场景中的扫描轨迹,使之产生的图像序列满足建图系统需求,是本文要解决的问题之一。2)自主性要求建图系统具有极高的鲁棒性。使其在没有人工介入的情况下,仍能适应纹理与形态各异的实际场景,并成功构建出全局一致的地图。此外,本文的动机之一是要让机器人能够在线、自主地构建大尺度场景的三维地图,使机器人作为一个独立的主体来持续地感知与理解它所处的室内环境,故实时性与可扩展性亦是本文追求的目标。3)需要借助场景分析手段将三维地图解释成机器人能够理解的语义地图。本文通过研究上述三个子问题,探索出一条机器人自主构建室内语义地图的新途径。具体来说,本文的主要工作分为以下三个方面:第一,关于自动图像采集,本文定义了旋转扫描和移动扫描这两个基本行动,并以现有二维栅格地图作为环境先验信息,在栅格地图上规划扫描场景的行动序列。为了获得最佳扫描规划,本文还定义了评价函数来评估扫描规划的优劣程度,并使用随机搜索方法寻找最优解。第二,关于RGB-D建图,本文在RGB-D图像序列上提取关键帧,在连续关键帧之间进行空间对齐,并在关键帧序列上进行环闭合检测,同时在关键帧集合上进行全局优化,由此构建出全局一致的三维地图。本文对RGB-D图像帧间对齐和环闭合检测方法进行了深入的研究,以获得鲁棒、快速的建图性能。此外,本文采用KinectFusion方法来进行表面重建,同时讨论了该方法在大尺度场景下的存储开销问题。第三,关于场景分析,本文研究了针对点云场景的快速平面检测算法,通过抽取平面完成对点云场景的分割。对分离开来的场景各部分提取特征,并使用简单规则进行识别。由此,将无序点集转换成带有语义信息的三维拓扑地图。本文主要的贡献和创新之处包括如下三个方面:首先,本文提出了一种面向三维建图的RGB-D摄像头扫描规划方法。当前的RGB-D建图(或重建)系统主要通过人手持摄像头在环境中采集图像,自动化程度较低且难以高效获取大尺度场景的图像信息。而该方法能在二维栅格地图上自动规划出摄像头扫描轨迹,实验表明自动规划结果与RGB-D建图研究人员设计的扫描轨迹基本一致,证明了该方法的有效性。其次,本文提出了一种极其鲁棒且快速的基于点、面特征的帧间对齐算法。在关键帧序列(接近3千帧)上的相邻帧帧间对齐实验中,该算法错误率为零,证明了其鲁棒性;同时,算法避开了时间复杂度高的ICP类对齐技术,故效率极高,在主流PC上(无需GPU加速)即可获得实时性能。该算法使得自主建图系统的鲁棒性有了重要保障。此外,结合基于特征匹配的快速环闭合检测技术,以及全局优化技术,再加上前面所述的自动扫描规划方法,共同构成一个能在机器人上在线运行的三维地图自动构建系统。再次,对于场景点云模型的理解,这里提出了一种基于投影变换的快速平面提取算法,能够在数秒时间内完成数百万量级点云场景的平面检测与抽取。通过后续的场景分割与简易识别,得到了场景语义地图。本文使用廉价的RGB-D摄像头,为移动机器人设计了一套完整的自动构建室内场景三维语义地图的系统。需要指出的是,该系统仍需要一幅二维栅格地图作为场景先验知识。因此,在实际应用中,仍不能完全脱离当前主流的激光(LRFs)感知与导航。而摆脱对场景先验知识的依赖,仅使用RGB-D摄像头,完成对陌生环境的自主探测及语义地图构建,将是未来工作的一个重要方向。尽管离能实用的完全自主、纯RGB-D摄像头的导航与感知还有一段较长的距离,本文工作仍可以看做朝着这一很有前景的目标迈出了重要一步。

【Abstract】 The construction of intelligent robotic agents that are able to percept and understand their surroundings autonomously has been a long-standing goal of engineers and scientists in the field of robots with artificial intelligence. In re-cent years, with the advances of research on domestic service robot and the is-sues of novel depth sensors, building3D maps of indoor environments using the RGB-D cameras attracted more and more attention and investigation. Particu-larly, achievements from many areas, such as robotics, computer vision, computer graphics and so on, are utilized to align and merge RGB-D image sequence and further obtain the global represent of3D scenes. There are many in-depth inves-tigations and valuable works in the aspects of precision, scale and speed of map building and map representation of real indoor environments.On the require of3D map models for indoor mobile robots, We systematically studied the map building approaches with RGB-D cameras in this thesis from the perspective of autonomy, and has constructed an automated information process-ing chain from sensor hardware to high-level semantic model. In this context, autonomy means that the3D maps of the indoor scene are built by robots auto-matically, and the maps require automatically analysis and understand to gener-ate semantic model. It is a low-cost environment modeling technique to obtain three-dimensional semantic model of indoor environment with RGB-D cameras. The study of this technology has extensive application value and important com-mercial value. For example, the pure RGB-D camera navigation derived from this technique is a hopeful alternative to the current mainstream but expensive laser based navigation.In this paper, the following three aspects are taken into consideration to achieve the autonomy. Firstly, the robot should be able to scan the scene and capture RGB-D image automatically. It is one of the issues addressed by this article that how to plan the scene scanning trajectory of the camera automatically on limited priori information about the scene, and generating a sequence of images which meets the mapping system requirements. Secondly, autonomy requires that the mapping system should be robust enough. It should be able to adapt to real scenes with various texture and structures and build a global map successfully, even without manual intervention. In addition, one of the motivations of this article is to make the robot build3D map of large-scale scene online and treat the robot as an autonomous agent to percept and understand its surroundings, so real-time performance and scalability is also the pursuit of this article. Thirdly, the3D maps of scenario need to be interpreted into semantic maps that the robot can understand.A new schema that robots build indoor semantic maps autonomously is pre-sented by investigating the above three sub-problems. Specifically, the main work of this paper is divided into the following three aspects:First, for the automatic image acquisition problem, we defines such two basic actions as the rotating scanning and mobile scanning, and obtain action sequences of scanning the scene on the existing2D grid map which is regarded as prior information of the environment. To gain the best scan plan, a gain function is also defined to evaluate the merits of scan plan and the optimal solution is achieved by random search techniques.Second, for the RGB-D mapping, to obtain globally consistent3D maps, we extract key frames from RGB-D image sequence, and spatial alignment, loop clo-sure detection and global optimization are taken on the key frame sequence. We have a deep investigation on the alignment and loop closure problem on RGB-D frames in order to achieve robust and rapid mapping performance. In addition, we adopt method presented in KinectFusion to achieve elaborate surface reconstruc-tion, and discuss the storage problem while extending the method to large-scale environment.Third, as for scene analysis, a fast plane detection algorithm is presented for the point cloud of the scene. Segmentation of the point cloud is done by extracting the planes in the scene, and then extracting features from separated parts of point cloud and recognizing with simple rules of the indoor environments. As a result, unsorted points set is converted in a3D topological map with semantic information.The contributions and innovations of this paper include the following three aspects:Firstly, a RGB-D camera scan-path planning method is proposed for3D mapping in this paper. The input images of current RGB-D mapping system are captured mainly by hand-held camera traversing throughout the environment. It suffers from low degree of automation, especially collecting images for large-scale scenes. And though this method, the camera scanning trajectory can be achieved in an automatic way. Experiments show that the automatically planned scan trajectory is very similar to the one designed by the expert, and thus prove the effectiveness of this method.Secondly, this paper presents an extremely robust and fast point and plane features based RGB-D image alignment algorithm. We perform frame-to-frame alignment experiment on adjacent frames in the key frame sequence (close to3,000frames), and the result of none error demonstrates the robustness of this algorithm. The algorithm avoids the time-consuming ICP-style alignment techniques, so it is of high efficiency and can achieve real-time performance on mainstream PC with-out GPU acceleration which in turn guarantees the robustness of the automatic mapping system. In addition, the combination of rapid feature matching based loop closure detection and global optimization techniques, and coupled with the automatic scan planning method described earlier, constitute a3D online mapping system running on the robot.Thirdly, a rapid plane extraction algorithm based on the projection transform is presented for the understanding of the scene through its point cloud model, it just cost a few seconds to detect and extract the planes from scene containing millions of3D points. The semantic map is obtained by subsequent scene seg-mentation and simply recognition of the scene.In this paper, we have designed a system which is able to automatically build3D semantic maps of indoor scene using cheap RGB-D cameras. It should be noted that the system still needs a2D grid map as the prior knowledge of the scene. Therefore, in practical applications, it still cannot be completely departing from the mainstream LRFs based perception and navigation. Getting rid of the prior knowledge of scene and fulfilling the task of exploration and semantic mapping of the unfamiliar environment just using an RGB-D camera, will be an important direction for future work. Though lots of works need to be done to realize the practical full autonomous and pure RGB-D camera navigation and perception, this work can still be regarded as an important step toward this promising goal.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络