节点文献

增强学习在共面双机空战格斗中的应用

Reinforcement Learning with Its Application in Coplanar Air Combat

【作者】 罗宁泉

【导师】 刘长有;

【作者基本信息】 沈阳工业大学 , 控制理论与控制工程, 2003, 硕士

【摘要】 微分对策作为解决追逃动态对策问题主要工具已经经历了近50年的发展,就其本身而言已经发展的相当成熟,但距实际应用还有一段距离。这主要是由于微分对策理论来源于最优控制理论,因此它需要精确的数学模型,以及在求解时会遇到非线性两点边值问题和奇异面问题。 近年来,随着人工智能的兴起,国内外许多学者致力于将智能控制理论引入微分对策理论的研究中。而要达到智能化制导就不可避免地涉及知识的自动提取和利用问题。作为机器学习的一种方法,增强学习恰可使知识的获取过程自动化,并扩展所能得到的知识资源范围。 本文研究了共面双机空战格斗的动态对策问题,采用增强学习与微分对策相结合的方法,避免了传统的控制理论根据被控对象的精确数学模型和性能指标来求解最优解析解的方法带来的困难。并依据人的模糊思维建立空战对策准则,实现状态空间的离散化以减小动作空间范围,提高网络学习效率。 本文针对传统增强学习中出现的“维数灾难”问题以及学习问题中的“Structure Credit-Assignment”问题采用BP神经网络近似Q-学习的评价函数的解决方法。 在仿真试验中考虑诸多实际因素,并采用了实际空气动力学参数,仿真结果验证本文所采用的方法的有效性,表明将增强学习与微分对策理论相结合,并应用于空战格斗问题中是—种有前途的发展方向。 本文首先分析双机格斗的重要性及其研究方法的发展,并给出设计方案的依据及总体框架。在第二章介绍了增强学习的特点、发展历史和各种算法。在第三章设计了基于Q-学习智能空战制导控制,并给出空战对策准则。在第四章对水平面双机空战格斗常、变速数学模型进行了仿真试验,对仿真结果作了分析。

【Abstract】 As a main tool to deal with pursuit-evasion games, the differential game theory has developed very well for fifty years. However, it is difficult to apply the results in real air games. Most analytical studies need precise mathematics model and involve to solve the problems of nonlinear two-point boundary value and singular surfaces, which are formulated by the set of necessary conditions of game optimality. So it is impossible to get the accurate solutions.With the development of artificial intelligence, many recent studies have been devoted to combining intelligent control with differential game theory. In the realization of intelligent control, it must be involved in automatic acquisition and utilization of knowledge. As one of machine learning, reinforcement learning not only has this function but also can expand the acquisitive resource.By using the method of combining reinforcement learning and differential game theory, the coplanar air combat problem between two aircrafts is analyzed. Based on this method, it is avoided to solve the tedious two-point boundary value problem derived from the optimal control theory. By the human fuzzy logic, the rule of air-combat policy is built, which decomposes the state space , decreases the action space and improves the efficiency of neural network.The value function approximation of reinforcement learning with neural network is studied. Based on this method, the problems of the "curse of dimensionality" in the reinforcement-learning algorithm and "structure credit-assignment" in learning are solved.In the simulation, many practical conditions and realistic aerodynamic data are analyzed. The simulation results show the validity of applying reinforcement-learning-based differential games to coplanar air combat.This paper is outlined as follow: the importance and method of research in the two-aircraft combat is firstly analyzed, And then the general structure and the bases of design are presented. In Section 2, the nature, history and algorithms are introduced. In Section 3, the rule of air combat countermeasure is given, and the intelligence guidance problem of air combat withreinforcement learning is discussed. In Section 4, Based on the horizontal two-aircraft combat dynamics model, the numerical simulations are made respectively for constant-speed and variable-speed cases, and then these results are analyzed.

  • 【分类号】E84
  • 【下载频次】144
节点文献中: 

本文链接的文献网络图示:

本文的引文网络