节点文献

基于强化学习的移动机器人路径规划研究

Research on Path Planning for Mobile Robot Based on Reinforcement Learning

【作者】 许亚

【导师】 马昕;

【作者基本信息】 山东大学 , 控制科学与工程, 2013, 硕士

【摘要】 随着机器人技术的发展,机器人已开始应用到未知环境,与已知环境下的移动机器人路径规划研究相比,对于未知环境的探索带来了新的挑战。由于在未知环境下,机器人不具有环境的先验知识,移动机器人在路径规划过程中不可避免的会遇到各式各样的障碍物,因此,研究具有灵活规划和避障功能的移动机器人及其在未知环境下的路径规划具有非常重要的实际意义。本文以移动机器人在未知环境探索中的路径规划为研究背景,利用强化学习算法实现机器人的路径规划。原有的强化学习算法Q-learning算法和Q(λ)算法可以实现移动机器人的路径规划,但是在较大环境和复杂的环境下,这两种算法很难达到理想的效果,其最大的缺陷就是学习时间长、收敛速度慢。为了解决这些问题,本文提出了单链序贯回溯Q-learning算法,在学习过程中建立状态链,通过回溯的思想改善Q-learning学习中数据传递的滞后性,使当前状态的动作决策能够快速的受到后续动作决策的影响,并应用到单个机器人和多个机器人在未知环境下的路径规划中,解决学习速度慢的问题以及机器人的避障和避碰问题,使移动机器人能够快速有效的找到一条最优的路径,并通过仿真实验验证了算法的有效性。文章首先分析了移动机器人路径规划的研究背景和意义,综述了目前移动机器人路径规划技术的国内外研究现状以及存在的主要问题,并简单介绍了本论文的主要内容和章节框架。其次,介绍了移动机器人路径规划技术的主要类型,并对全局的路径规划算法和局部的路径规划算法进行了详细阐述;针对本文采用的强化学习算法,这部分详细介绍了强化学习算法的研究现状和发展趋势以及存在的问题,对强化学习算法的基本概念、原理和方法进行了说明,并描述了该算法在路径规划中的应用。第三,针对目前路径规划领域应用广泛的Q-learning算法和Q(λ)算法学习时间长、收敛速度慢、难以应用到较大较复杂环境的缺陷,提出了利用回溯的思想进行状态数据更新的高性能算法---单链序贯回溯Q-learning算法应用到移动机器人在复杂环境下的路径规划,通过在不同大小不同复杂程度的环境下的实验,验证了该算法的快速收敛性和大环境下的实用性,为移动机器人路径规划问题提供了一种崭新的方法。第四,以多个移动机器人系统为研究对象,利用提出的高性能的强化学习算法,通过机器人之间在不确定环境下的学习策略解决探索过程中的路径规划问题,实现每个机器人的避障和机器人之间的冲突问题,提高到达目标点的效率。最后,对本论文所做工作进行总结,并提出了下一步的研究方向。

【Abstract】 With the development of robot technology, the robot has begun to be applied to the unknown environment now, compared with the research on the path planning in the known environment, the unknown environment brings new challenges to the path planning of environment exploration for mobile robot. Ineluctably, mobile robot will encounter a variety of obstacles when exploring because there is no prior knowledge of the environment for robot. Therefore, the mobile robot which can obstacle avoidance and has a flexible planning in an unknown environment has a very important practical significance.In this paper, we use reinforcement learning algorithm to study the path planning for mobile robot based on the exploration research in unknown environment. The reinforcement learning algorithm Q-learning algorithm and algorithm can achieve mobile robot path planning, but these two algorithms are difficult to achieve the desired results, especially in the large and complex environment, the biggest drawbacks of which are the long time to learn and slow rates of convergence. Aiming at the problem of slow convergence and long learning time for Q-learning based mobile robot path planning, a state-chain sequential feedback Q-learning algorithm based on the idea of backtracking is proposed in this paper for quickly searching for the optimal path of mobile robots in complex unknown static environments. The state chain is built during the searching process. After one action is chosen and the reward is received, the Q-values of the state-action pairs on the previously built state chain are sequentially updated with one-step Q-learning. With the proposed algorithm, the single robot can solve the problem of obstacle avoidance and multiple robots can solve the collision avoidance during the path planning in unknown environment. Extensive simulations validate the efficiency of the newly proposed approach for mobile robot path planning in complex environments. The results show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time.Firstly, the paper analyzes the research background and significance of the mobile robot path planning, sum up the research and development of the mobile robot path planning at home and abroad, as well as the main problems. Then the main content and chapters framework of this paper are described brief.Secondly, this part introduces the main type of mobile robot path planning technology, and present the global path planning algorithm and local path planning algorithm in detailed; as for the reinforcement learning algorithm, this section introduces the research, development trend and the existence of the problem of the reinforcement learning algorithm. Besides, the basic concepts, principles, methods and application of reinforcement learning algorithm are described in this part.The third part aims at the problem of long learning time, slow convergence and difficulty to apply to the larger, more complex environment for Q-learning algorithm and Q(λ) algorithm based mobile robot path planning, a state-chain sequential feedback Q-learning algorithm based on the idea of using backtracking to update the state data is proposed for quickly searching for the optimal path of mobile robots in complex unknown static environments. The extensive simulations of different environment show that the new approach has a high convergence speed and that the robot can find the collision-free optimal path in complex unknown static environments with much shorter time state chain is built during the searching process. so it provides a new method for mobile robot path planning.The fourth part studys the multiple mobile robot system based on the proposed high-performance reinforcement learning algorithm, each robot can solve the problems of avoidance and collision with other robots by learning exploring strategies in an uncertain environment path planning which can improve the efficiency of the target point is reached.Finally, conclusions are given with recommendation for future work.

  • 【网络出版投稿人】 山东大学
  • 【网络出版年期】2013年 11期
  • 【分类号】TP242;TP181
  • 【被引频次】3
  • 【下载频次】555
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络