节点文献

基于改进深度强化学习的固定翼无人机着舰控制

Control of UAV Carrier Landing Based on Improved Deep Reinforcement Learning

【作者】 刘璐

【导师】 徐光延; 郗杰;

【作者基本信息】 沈阳航空航天大学 , 工程硕士(专业学位), 2022, 硕士

【摘要】 随着航空工业的不断发展,研究人员对舰载无人机的控制方法进行了大量研究。舰尾流场和地面效应严重影响无人机着舰过程的稳定性和安全性,为提高自主着舰能力,本文利用深度强化学习(Deep Reinforcement Learning,DRL)中的双延迟深度确定性策略梯度(Twin Delayed Deep Deterministic Policy Gradient,TD3)算法以及改进的TD3算法,对受舰尾流和地面效应影响的无人机自主着舰问题进行研究。首先,描述了无人机自主着舰跟踪控制问题,将无人机着舰的离散时间模型转换为由状态量和控制量组成的有限马尔可夫决策过程,设计了基于深度强化学习的自主着舰控制器框架。其次,应用DRL中的TD3算法,设计了一个TD3自主着舰控制器。通过设计的TD3自主着舰控制器的训练框架,将每一训练回合产生的经验信息存入经验缓存池。接下来,从经验池中随机批量抽取经验信息,根据具体的控制器网络参数更新过程进行训练。使用训练好的TD3控制器进行模拟仿真,所得结果均符合飞行要求。将仿真结果与非线性控制中经典的动态逆控制器的仿真结果进行比较,发现TD3控制器不仅能够精确控制无人机的着舰轨迹,还能够应对动态逆控制无法解决的非线性舰尾流的影响,极大地提高了无人机着舰的安全性。最后,针对TD3算法的缺点,本文提出了改进的TD3算法,并成功应用于无人机自主着舰问题。该算法提出了一个具有2个actors和2个critics的新颖架构,分别用于评估确定性策略和动作值函数。与传统的单一经验池不同,改进的TD3算法根据是否成功着舰将经验池一分为二,分别通过优先经验回放方法和经验回放方法进行采样,提高了采样的有效率。最后,通过大量的学习和训练,得到了一个改进的TD3算法自主着舰控制器,能够在受舰尾流影响的环境中对无人机进行精确控制。仿真结果表明,该控制器显著提高了训练效率。

【Abstract】 With the continuous development of aviation industry,researchers have conducted a lot of research on the control methods of Shipborne UAVs.The carrier air-wake and ground effect seriously affect the stability and safety of UAV landing process.In order to improve the autonomous carrier landing capability,the Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm in Deep Reinforcement Learning(DRL)and the improved TD3 algorithm are used to study the autonomous landing problem of UAVs affected by air-wake and ground effects.Firstly,the problem of autonomous UAV landing is described,and the discrete-time model of UAV landing is modeled as a finite Markov Decision Process form consisting of state and control quantities.A DRL autonomous landing controller framework is designed.Secondly,applying the TD3 algorithm in DRL,a TD3 autonomous carrier landing controller is designed.Through the designed training framework of the TD3 autonomous landing controller,the experience information generated in each training round is stored in an replay buffer.Next,the information is randomly sampled in batches from the replay buffer and trained according to the specific controller parameter update process.Using the trained TD3 controller for simulation,the results meet the flight requirements.Compared with simulation results of Nonlinear Dynamic Inverse(NDI)controller,the TD3 controller can not only accurately control the UAV landing trajectory,but also deal with the air-wake that cannot be solved by NDI.The safety of UAV landing has been greatly improved.Finally,aiming at the shortcomings of TD3 algorithm,this thesis proposes an improved TD3 algorithm,which is successfully applied to the autonomous UAV landing problem.The algorithm proposes a novel architecture with 2 actors and 2 critics for evaluating deterministic policies and value functions,respectively.Different from the traditional single replay buffer,the improved TD3 algorithm divides the experience pool into two according to whether the landing is successful or not,and then samples through the priority experience replay method and the experience replay method respectively.Finally,through a lot of learning and training,an improved TD3 algorithm autonomous carrier landing controller is obtained,which can precisely control the UAV in the environment affected by the air-wake.Simulation results show that the controller significantly improves sampling efficiency and training efficiency.

  • 【分类号】V279;V249.1
节点文献中: