节点文献
基于多Agent强化学习的RoboCup局部策略研究
Researches of Robocup’s Local Strategy Based on Multi-Agent Reinforcement Learning
【作者】 李瑾;
【导师】 刘全;
【作者基本信息】 苏州大学 , 计算机应用技术, 2012, 硕士
【摘要】 强化学习是人工智能领域中一种重要的用于解决学习控制问题的方法。但是经典强化学习算法在解决RoboCup局部策略训练问题时,仍然存在算法收敛速度缓慢,无法有效解决训练中存在的环境不确定性、多Agent协作与通信以及多目标特性等问题。针对强化学习算法应用于RoboCup局部策略训练时所存在的收敛速度缓慢和无法有效解决局部策略训练多目标特性这两个问题,本文提出了相应的改进方法,其研究内容主要包括以下四个方面:(1)针对累积立即奖赏值形式存在的收敛速度慢、容易陷入局部最优等问题,提出了一种非累积的立即奖赏值形式,将其结合到经典的强化学习方法中,形成了基于非累积立即奖赏值形式的强化学习方法。将该方法应用到机器人足球1对1射门训练中,实验结果表明,非累积立即奖赏值形式在该问题上的收敛速度和训练效果都要优于累积立即奖赏值形式。(2)针对平均奖赏强化学习固有的收敛度慢的问题,提出了一种改进的强化学习算法。同时,为了处理训练中产生的大状态空间问题,提高泛化能力,该算法结合了BP神经网络作为近似函数。将该方法运用于Keepaway局部训练中,训练结果表明,该算法具有较快的收敛速度和较强的泛化能力。(3)针对多目标强化学习问题,提出了一种基于最大集合期望损失的多目标强化学习算法——LRGM算法。该算法预估各个目标的最大集合期望损失,在平衡各个目标的前提下,选择最佳联合动作以产生最优联合策略。(4)针对强化学习结合非线性函数泛化不收敛的问题,提出基于改进的MSBR误差函数的Sarsa(λ)算法,证明了算法的收敛性,并对动作选择概率函数和步长参数进行优化。将该算法与多目标强化学习算法LRGM相结合,应用于RoboCup2对2射门局部策略训练中,取得了较好的效果,实验结果表明了该学习算法的有效性。
【Abstract】 Reinforcement learning has become a central paradigm for solving learning-controlproblems in artificial intelligence. The traditional reinforcement learning suffers from slowconvergence and could not be availably used in some applications, such as uncertainenvironment, multiple agents and multiple goals. And the training of RoboCup has allthese problems. To solve the slow convergence and the multi-goal feature, some improvedalgorithms are proposed in this paper.The main research contents are concluded as follows:ⅰ. The expected cumulative-reward could not be used in all applications, and itsuffers from slow convergence due to the influence of accumulating the lower rewards, andtakes time to fade away the effect of the sub-optimal policy. To solve these problems, thenon-cumulative reward is proposed in this paper, and the reinforcement learning modelwith the non-cumulative reward is also proposed. The algorithm is applied to the shootingtraining of RoboCup. The experimental results show that the proposed algorithm hascertain advantages compared to reinforcement learning methods with the expectedcumulative-reward.ⅱ. R-Learning has some problems, such as slow convergence and sensitivity withparameters. To solve the problem of the slow speed of convergence, an improvedR-Learning algorithm is proposed. The algorithm uses BP as the approximate function togeneralize the state space. The experimental results of Keepaway show that the proposedalgorithm converges faster and has the ability of generalization.ⅲ. To solve the multiple-goal problem of RoboCup, a novel multiple-goalreinforcement learning algorithm, LRGM, is proposed. This algorithm estimates the lostreward of the greatest mass of sub goals and trades off the long term reward of sub-goals toget a composite policy. ⅳ. B error function of the single learning module based on MSBR error function isproposed in this paper. B error function has guaranteed the convergence of the valueprediction with nonlinear function approximation. The probability of selecting actions andthe parameter α are also improved with respect to B error function. The experimentalresults of shooting2vs.2show that the LRGM-Sarsa(λ) is more stable and can convergefaster.
【Key words】 reinforcement learning; multi-goal; non-cumulative reward; RoboCuptraining;