节点文献

基于马尔可夫决策过程理论的Agent决策问题研究

Research on Agent Decision Problem Based on Markov Decision Process Theory

【作者】 石轲

【导师】 陈小平;

【作者基本信息】 中国科学技术大学 , 计算机应用技术, 2010, 硕士

【摘要】 人工智能被认为其主要目标是构造可以决策出智能行为的Agents,即这些Agents能够在多方面再现人类可以做出的智能行为。马尔可夫决策过程(MDP)可以用来描述和处理大规模不确定性环境下的Agent决策问题。RoboCup机器人世界杯是国际上一项为促进分布式人工智能、智能机器人技术及其相关领域的研究与发展而举行的大型比赛和学术活动,RoboCup仿真2D比赛是RoboCup所有项目中以Agent决策为重点的一个分支。本文以马尔可夫决策过程的相关理论为基础,以RoboCup仿真2D比赛为实验平台,对Agent决策相关问题进行了研究。本文的主要工作可以概括为以下三个方面:本文重构并实现了一个完整的RoboCup仿真2D球队决策系统WE2009。该系统以部分可观察随机博弈(POSG)的模型为理论基础,包括信息处理、高层决策和行为执行三个模块。特别是高层决策模块,采用基于独立行为生成器的结构设计,不仅可以充分利用Agent的决策时间,而且可以提高团队合作的效率。本文提出了一类特殊的马尔可夫决策过程,即行动驱动的马尔可夫决策过程(ADMDP)。本文分析了ADMDP的理论模型,提出了ADMDP的相关求解方法。该方法采取离线值迭代与在线搜索相结合,在本文中用来求解RoboCup仿真2D比赛中的不离身带球问题,使Agent的带球性能有了较大的提高。本文提出了一类特殊的马尔可夫博弈,即基于阵型的零和马尔可夫博弈(FZSMG)。本文分析了FZSMG的理论模型,并以此为基础来描述RoboCup仿真2D比赛中的Anti-Mark问题。针对Anti-Mark问题,本文提出了一个基于阵型变换的启发式求解方法,使球队在与盯人防守的对手比赛时取得了较好的效果。本文的所有工作都是基于WE2009实现的,WE2009在完成后参加了2009RoboCup机器人世界杯和2009中国机器人大赛两次重要比赛,并且全部获得冠军。

【Abstract】 As most people thought, the goal of Artificial Intelligence is to construct Agents which can make intelligent behaviors, and it also means that these agents will recreate intelligent human behaviors in all respects. Markov Decision Process (MDP) could be used to describe and process Agent decision problems in large size and probabilistic environments.RoboCup is an international competition and scientific activity to prompt decentralized Artificial Intelligence, intelligent robotics and related fields. The 2D competition of soccer simulation league is a branch of RoboCup which is emphasis on Agent decision problems.In this dissertation, we have done research on Agent decision problems based on the theory of MDP and the test bed of RoboCup 2D soccer simulation. The three main contributions of this dissertation are as below:We design and realize a complete 2D soccer simulation team system which is called WE2009. WE2009 is based on the theory of Partially Observable Stochastic Games (POSG) and consists of three modules: message parser, high level decision and low level actions. The high level decision module which adopts a structure based on independent behavior generator, can not only make use of the decision time sufficiently, but also increase the efficiency of teamwork.We propose a special kind of MDP, which is called Action-Driven Markov Decision Process (ADMDP). We analyze the theory model of ADMDP and propose the algorithm for solving ADMDP. This algorithm based on offline value iteration and online research is used for the proximal dribble problem in 2D soccer simulation. The empirical result shows that it is much better than the old algorithm of our team in Agent’s dribble performance.We propose a special kind Markov Game, which is called Formation-based Zero-Sum Markov Game (FZSMG). We analyze the theory model of FZSMG which is used to describe Anti-Mark problem in 2D soccer simulation. We propose a new heuristic method based on formation change to solve the Anti-Mark problem, which gets a better performance in the competition with the opponents depending on mark defense system. All above works are realized in WE2009 2D soccer simulation team. This team has participated RoboCup 2009 and RoboCup China Open 2009 and won two champions!

节点文献中: 

本文链接的文献网络图示:

本文的引文网络