节点文献

自主移动机器人导航与控制中的增强学习方法研究

Research on Reinforcement Learning Methods for Navigation and Control of Autonoumous Mobile Robots

【作者】 李兆斌

【导师】 徐昕;

【作者基本信息】 国防科学技术大学 , 控制科学与工程, 2010, 硕士

【摘要】 用机器学习方法,特别是增强学习方法(Reinforcment learning: RL)提高移动机器人在未知环境中的控制性能和对环境的自适应能力,是自主移动机器人导航与控制研究领域一个非常重要的发展趋势。因此,本文在国家自然科学基金项目“基于核的增强学习与近似动态规划方法研究”的支持下,主要围绕增强学习中近似策略迭代(Approximate policy iteration: API)算法的性能评估、基于核的最小二乘策略迭代算法(Kernel-based least-squares policy iteration: KLSPI)的参数自动优化、近似策略迭代在移动机器人避障控制和自主驾驶车辆纵向速度学习控制中的应用进行研究。取得的主要成果和创新包括:1、首先对API算法进行了性能评估,通过实验对比分析,验证了API算法,特别是KLSPI在解决值函数平滑的序贯决策问题时性能更优,表明序贯决策问题值函数的平滑程度是影响API算法性能表现的重要因素。为克服KLSPI算法中核函数参数手动选择的不足,本文通过对初始样本进行ε-球近邻分析,得到稀疏化的核词典基础上,又提出了基于Bellman残差梯度下降的核函数宽度优化方法。仿真测试验证了这种核函数参数优化方法的有效性。2、对移动机器人自主避障行为决策过程进行Markov决策过程(Markov Decision Processe: MDP)建模之后,将滚动窗口路径规划和增强学习中的API算法相结合,提出了一种面向未知环境的移动机器人自主避障学习控制方法。仿真验证了该方法的泛化性能和对未知环境的自适应能力。同时,对两类不同的API算法用于自主避障时的学习效率进行了对比分析,结果表明基于KLSPI的自主避障方法可以更快地收敛到近似最优策略。3、在对高速公路自主驾驶车辆的研究现状、重难点问题和自主学习控制系统的研究意义进行分析后,对高速公路环境下车辆运动控制过程进行了MDP建模,提出了用于高速公路环境下自主驾驶车辆纵向速度控制的API学习控制方法,并对该学习控制方法进行了仿真研究。仿真结果表明基于API的学习控制方法可以实现对自主驾驶车辆期望速度较为准确的控制,为下一步自主驾驶车辆学习控制的深入研究打下了基础。

【Abstract】 To improve the control performance and adaptive ability in unknown ervironments with machine learning, especially reinforcement learning, is an imorptant research topic in the navigation and control of mobile robots. Under the support of the National Natural Science Foundation Project -- Research on kernel-based reinfrocement learning and approximate dynamic planning method,this paper studied the performance evaluation of approximate policy iteration (API) algorithms, parameters optimization of kernel function in kernel-based least-squares policy iteration (KLSPI), autonomous obstacle avoidance with API and learning control of longitudinal velocity of autonomous vehicles.The main contributions and innovations of this paper can be summarized as follows:1、API algorithms were tested in their performances. By carrying out experiments and analyzing their results, the much better performance of API algorithm was validated. It is demonstratred that KLSPI can have better performance when solving sequential decision-making problems with smooth value functions. It is verified that whether is the sequential decision-making problems with smooth value functions or not will play an important role in the performance of approximate policy iteration. Then, a new sample sparseness method which is analysed byε-neigthbor is proposed based on KLSPI, and furthermore, a width optimization method of kernel function based on the Bellman error gradient descent was proposed. Simulation results indicated the efficiency of the method.2、A Markov Decision Processe (MDP) model was given for autonomous obstacle avoidance of mobile robots. As a result, a new autonomous obstacle avoidance method, combining the API algorithm with the rolling window planning method, was proposed for mobile robots in unknown environment. Furthermore, the generalization and adaptation of the proposed method was tested in simulation, and the reliability of the two API-based control methods for autonomous obstacle avoidance was analyzed comparatively. It was indicated that the KLSPI-based autonomous obstacle avoidance method converged to the near optimal policy more quickly than other methods.3、An analysis was given on the key difficulties of autonomous vehicles, and the development of control systems for autonomous vehicles with learning ability. The motion control of vehicles on high way was modeled as MDPs, and then, an API learning and contrl method was designed to control the longitudinal velocity of vehicles on high way. The simulation results show that the API learning and contrl method can realize high-precision control for the expected velocity,what’s more, it lay the foundation for further research on learning control of autonomous vehicles.

节点文献中: