节点文献

仿生水下机器人的增强学习控制方法研究

On Reinforcement Learning Control for Bionic Underwater Robots

【作者】 林龙信

【导师】 沈林成;

【作者基本信息】 国防科学技术大学 , 控制科学与工程, 2010, 博士

【摘要】 仿生水下机器人是近年来水下机器人领域的研究热点之一。仿生水下机器人复杂的动力学特性和不确定的工作环境使得其运动控制问题非常具有挑战性,直接影响着整体性能的提升。本文针对一类双波动鳍配置的仿生水下机器人,基于在增强学习框架下解决其运动控制问题的研究思路,围绕运动控制问题分析、增强学习算法构建、增强学习姿态镇定、增强学习轨迹跟踪以及试验验证等几方面内容展开研究,主要工作和研究成果包括:(1)从仿生学启示、仿生波动鳍和仿生水下机器人的动力学特性等角度对一类双波动鳍配置仿生水下机器人的运动控制问题进行了系统分析。研究了仿生对象的外部形态和游动特性,基于仿生学启示设计了仿生波动鳍推进器和仿生水下机器人“双仿生波动鳍+双摆动鳍+双自由度仿生鳔”组合推进控制方案,针对实际物理装置开展了仿生波动鳍和仿生水下机器人的推力试验和运动试验,获取了相关的动力学特性,为仿生水下机器人运动控制方法的设计提供了指导。(2)针对机器人控制的实际需求和基本Q学习算法的局限性,提出了一种面向实际机器人控制应用的连续状态-动作空间神经Q学习算法(CSANQL算法),综合利用前馈神经网络、学习样本数据库、Q值估计拟合函数、以及基本Q学习算法,实现了在连续状态和连续动作之间的快速有效映射。研究了神经Q学习算法的两种实现结构,揭示了基于Q值估计拟合函数实现连续动作的机理,分析了学习样本数据库在提高算法学习效率方面的作用,阐明了增强学习算法与仿生水下机器人运动控制的结合途径,为仿生水下机器人增强学习控制方法的研究奠定了基础。(3)针对仿生水下机器人的姿态镇定问题,从学习优化和学习控制两个层次提出并设计实现了增强学习自适应PID控制、增强学习控制和监督增强学习控制等三种增强学习姿态镇定方法。研究了基于增强学习的参数自适应机制,分析了学习样本数据库和监督控制在增强学习控制方法中的重要作用,并通过仿真对增强学习控制方法在姿态镇定问题中的有效性进行了初步验证。结果表明,增强学习自适应PID控制器能够主动学习最优的PID控制器参数,具有较好的姿态镇定性能;以CSANQL算法为基础的增强学习控制器的性能受学习样本数据库的影响,当学习样本数据库容量适当时能够有效实现姿态镇定目标;监督控制的引入,加快了学习的收敛速度,确保了学习过程尤其是学习初期输出动作的稳定性,使得监督增强学习控制器具有比增强学习自适应PID控制器和增强学习控制器更好的姿态镇定性能。(4)针对仿生水下机器人的轨迹跟踪问题,提出并设计实现了一种基于增强学习行为的行为控制结构。从复杂的轨迹跟踪任务中提取推进、偏航和定深等三个基本控制行为作为实现各种轨迹跟踪任务的基础,设计了基于增强学习控制方法的基本控制行为,提出了基于增强学习的行为组合优化方法,并围绕三维空间中的直线轨迹跟踪和曲线轨迹跟踪任务开展了仿真研究。结果表明,增强学习行为控制结构能够快速响应目标运动轨迹,在复杂的多通道轨迹跟踪任务中也具有较好的跟踪控制性能。(5)基于研究组自行研制的仿生水下机器人试验系统,开展了仿生水下机器人增强学习控制方法的试验研究,从姿态镇定和轨迹跟踪两方面进一步验证了论文提出的增强学习控制方法的有效性。研究表明,基于CSANQL算法的监督增强学习控制器具有比单纯增强学习控制器或传统PID控制器更好的姿态镇定性能;在基于增强学习行为的行为控制结构作用下,仿生水下机器人能够较好地跟踪设定的轨迹跟踪任务。上述研究工作和成果在仿生水下机器人的运动控制问题和增强学习控制方法的实际应用方面进行了有益探索,为在增强学习框架下最终实现仿生水下机器人的高效自主运动控制奠定了基础。

【Abstract】 The bionic underwater robot is one of the hotspots in the underwater robotics research field in recent years. It has complicated dynamic characteristics and uncertain working environments which make the motion control of bionic underwater robots a challenging problem. This thesis takes the bionic underwater robot with two undulating fins as research object, and aims to figure out the motion control problem in the framework of reinforcement learning. The studies in this thesis concentrate on the motion control problems analysis, the reinforcement learning algorithm design, reinforcement learning based attitude stabilization, reinforcement learning based trajectory tracking, and corresponding experimental verifications. The main achievements and progress are as follows:(1) The motion control problems of the bionic underwater robot with two undulating fins are analyzed from the bionic inspirations, the dynamic characteristics of the bionic undulating fin and the bionic underwater robot. The morphology and swimming characteristics of Gymnarchus Niloticus and Bluespotted Ray are investigated to provide inspirations to the design of bionic underwater robots; and then, the bionic undulating fin thruster and the bionic underwater robot with two bionic undulating fins, two swing fins and a 2-DOF bionic bladder are designed. After that, the thrust testing and motion testing are carried out and the corresponding dynamic characteristics are analyzed, which provide directions for the motion controller of bionic underwater robots.(2) According to the requirements of robot control and the restrictions of the original Q-learning algorithm, a continuous state and action space neural Q-learning algorithm (CSANQL) is presented, which lays a foundation for the reinforcement learning control of bionic underwater robots. By utilizing the neural network, the database of learning samples, the fitting function of estimated Q values and the original Q-learning algorithm synthetically, the CSANQL algorithm realizes fast mapping between continuous states and continuous actions. The two structure of neural Q-learning, the mechanism of generating continuous actions by the fitting function of estimated Q values, and the effects of the database of learning samples on improving the learning efficiency are detailed. The approaches of incorporating the reinforcement learning algorithms into the motion control of bionic underwater robots are also discussed.(3) For the attitude stabilization of the bionic underwater robot, three reinforcement learning based attitude stabilization methods including reinforcement learning based adaptive PID controller, reinforcement learning controller and supervised reinforcement learning controller are put forward and implemented. The adaptive mechanism of parameters based on reinforcement learning is discussed, and the functions of the databae of learning samples and the supervisory controller are elaborated. Simulations are carried out to test the validity of the reinforcement learning control in attitude stabilization. Results indicate that reinforcement learning based adaptive PID controller can learn the optimal PID parameters actively, and has good attitude stabilization performance; the database of learning samples has great influence on the performance of reinforcement learning controller, and preferable performance can be achieved if proper capability of the database is given; the supervised reinforcement leanring controller have better performance than reinforcement learning based adaptive PID controller and reinforcement learning controller in learning efficiency and dynamic process.(4) For the trajectory tracking of the bionic underwater robot, a behavior control structure which based on reinforcement learning behaviors is devised and implemented. Thrusting behavior, yawing behavior and depth-keeping behavior are extracted from complex trajectory tracking tasks, and treated as three basic control behaviors which would realize almost all trajectories in 3D world. The implementation and performance of each basic control behavior are discussed; and a reinforcement learning based optimization method for behaviors combination is expounded. Simulations with straight line trajectory and curve trajectory are performed to validate the validity of reinforcement learning control for trajectory tracking. Results indicate that the reinforcement learning behavior control structure can respond to the target trajectory quickly and has favorable tracking performance.(5) Experiments are carried out based on the bionic underwater robot with two undulating fins to test the reinforcement learning control methods both in attitude stabilization and trajectory tracking. According to the experimental data, the CSANQL based supervised reinforcement learning controller behaves better than pure reinforcement learning controller and convertional PID controller in attitude stabilization; and the bionic underwater robot can perform some trajectory tracking tasks with good performance in the control of reinforcement learning behavior control structure.All the above achievements effectively facilitate the breakthrough at the motion control of the bionic underwater robots and the practical application of reinforcement learning control methods, and consequently, lay the foundation for realizing efficient autonomous motion control of bionic underwater robots within the reinforcement learnig framework.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络