节点文献

多Agent系统中合作与协调机制的研究

Research on Cooperation and Coordination in Multi-agent Systems

【作者】 肖正

【导师】 张世永;

【作者基本信息】 复旦大学 , 计算机应用技术, 2009, 博士

【摘要】 普适化、网络化、智能化、代理化、人性化是自动化计算发展的总体趋势,多Agent计算正是在此历史进程中继分布式计算、P2P计算出现的一种新的先进计算模式。其对问题求解过程类似于人类思维的方式,不同于传统的算法设计,不需要对问题有全面的分析,而只需指定Agent的目标,它们能通过彼此交互自动地逐步实现用户的目标。对大型分布式问题建立多Agent系统使计算机系统能更智能化,进一步代替更多人的工作;面向Agent的软件工程使程序设计更为人性化,软件设计过程更符合人的思考习惯;基于Agent的社会仿真是计算机科学与社会学的结合,使计算机技术在人文领域发挥其积极作用。多Agent计算有利于促进计算机技术的进一步繁荣。多Agent计算要真正达到其概念提出所具有的优秀特性,还需要大量的科研努力。就基于Agent的系统而言,Agent的构造、通信语言的设计、合作与协调是多Agent计算最直接面临的、亟待解决的关键问题。而以合作与协调为目的的Agent交互能力是多Agent计算区别于其他计算模式的关键所在。正如人类社会一样,合作与协调是解决大型复杂问题的重要途径。本文正是对多Agent系统的合作与协调问题进行了积极的探索,在部分子方向上取得了一定的成果。组织建立、联盟形成、任务分配是多Agent合作研究的主要方向。组织和联盟是多Agent合作的基础,而任务分配实现合作关系的实例化。本文针对多Agent系统的任务分配问题,考虑多Agent的网络拓扑和能力水平存在差异的特点后,在以往并行计算任务调度的基础上,提出了两个适应网络拓扑的合作异构Agent间任务分配算法。一个是考虑这两个特性后通过穷举搜索得到最优Agent分配组合,一个是利用启发式搜索降低算法时间复杂度得到任务次优的Agent组合。对于大规模的多Agent系统、任务动态到达的情形下,以上算法无用武之地。因此,继续探讨了多任务流的动态分配问题,提出了基于Q学习的分布式自适应分配算法。该算法不仅能适应自身任务流的到达过程,还充分兼顾其他任务流到达过程及分配的影响。分布式特性使得算法适用于开放的、局部可见的多Agent系统,而强化学习的采用使得任务分配决策能适应系统的任务负载和分布。该算法表现出较高的任务吞吐量,较低的平均任务执行时间。对于多Agent系统中协调问题,主要的研究工作可以划分为三块:建立群体思维状态模型、多Agent规划、Agent社会规范。这三块对Agent之间的协调都有各自的优势和效果。本文对这一问题的工作是多Agent规划的延续。本文提出的两个模型所得到的规划不再是传统意义上一系列行为的排列组合,而是Agent在实现目标过程中行为的选择策略。这使得规划具有更大灵活性。多Agent学习是制定行为策略中研究较多且很具前景的方法。本文针对冲突博弈这一常见的Agent竞争关系进行了分析,基于矩阵博弈的Nash均衡概念定义了Agent的最佳响应策略,然后利用模型无关的强化学习方法找到该策略。该模型得到的策略很大程度上降低了冲突发生的次数,增强了Agent行为的协调性,而且从长期效用看,策略具有一定的公平性,有利于系统的稳定。对于一般和博弈的协调,目前提出的许多算法都较容易被利用而降低了自身的利益,本文在分析了Agent行为策略的时变性和适应性两个重要属性后,认为具有这两个属性的动态策略有利于Agent做出更为理性的决策,在混合多Agent环境下有利于避免被利用的风险,针对不同类型Agent做出最大化自身利益的响应。Agent大规模应用后,Agent社会将成为一个特殊的多Agent系统。这时Agent的社会属性将变得越来越重要。除了信念、意图、愿望等心智属性外,个性也将在Agent的行为选择中具有重要影响,依据个性对其他Agent建模有利于制定更为协调的行为策略。本文将个性加入到Agent的行为选择过程中,利用定性决策理论,建立了一个个性化的行为选择模型。不同的定性决策原则对应了不同的Agent性格特征,依据这些决策原则选择的行为造成了Agent行为的多样化。进一步,由于个性存在复杂和描述困难的特点,而人工神经网络具有刻画人类难以理解函数的优势,因此基于神经网络提出了一个新的个性化行为选择模型。相比于前者,该模型具有更强的个性表征能力,能刻画出更为细腻的个性类型。此外,基于复杂适应系统仿真工具包Swarm搭建了多Agent系统的仿真平台,并透过实例研究了个性在实践中的应用,更明确了个性研究的重要性和现实价值。以上这些工作尽管原理较为简单,但却是在传统符号逻辑基础上研究Agent心智状态之外的一个新的尝试和初步的探索,为多角度反映社会混沌复杂特征提供了可能。综上所述,本文以多Agent系统中的合作与协调机制为研究课题,通过广泛调研和深入探索,在任务分配、基于学习的行为协调、个性化行为选择三个问题上提出了如下若干有益的模型和算法:适应网络拓扑的合作异构Agent静态任务分配算法;基于Q学习的多任务流动态分配算法;基于后悔值的多Agent冲突博弈强化学习机制;混合多Agent环境下一般和博弈动态策略强化学习机制;基于定性决策理论的Agent个性化行为选择模型;基于人工神经网络的Agent个性化行为选择模型。

【Abstract】 Automatic computing is striding forward to pervasive,network-oriented, intelligent,agent-based,and humanized computation.Multi-agent computing is an advanced computing mode emerging right after distributed computing and peer-to-peer computing.Its problem solving process is very close to the way of thinking human being do.Unlike traditional algorithm designing which has to analyze the problem comprehensively,multi-agent computing only needs to assign agents their targets and then keeps free while these agents will automatically achieve client’s targets by their active interaction.Building multi-agent systems for large and distributed problems makes computer system more intelligent and further liberates people from their work.Agent-oriented software engineering makes programming more humanized with software designing complying with how people think.Agent based society simulation combines computer science and sociology, which makes computer technology penetrate into humanity science.It is convinced that Multi-agent computing can prosper computer technology.However,lots of effort should be made before multi-agent computing can really have a variety of outstanding properties as its concept says.As far as agent-based systems are concerned,agent construction,communication language designing, mechanisms of cooperation and coordination are three key problems to be solved urgently.Therein,ability to interact aiming at cooperation and coordination is the very point that distinguishes multi-agent computing from other computing modes. The same as human society,cooperation and coordination is an important method to solve large and complicated problems.This dissertation has actively investigated this issue,and made some achievements at some sub-directions.Research on cooperation of multiple agents concentrates on organization building,alliance forming,and task allocation.Organization and alliance are the infrastructure of multi-agent cooperation,while task allocation instantiates cooperation relationship among agents.Here,as for task allocation in multi-agent systems,considering agent topology and capability of different levels,on the basis of past task allocation algorithm in parallel computing we come up with two task allocation algorithms adaptable to agent topology among cooperative and heterogeneous agents.One tries to get optimal agent combination by brute-force searching on account of the two parameters topology and heterogeneity.The other gets suboptimal allocation scheme but with lower time complexity.In large scale multi-agent systems and when tasks arrive dynamically,above-mentioned algorithms seem incompetent.Hence,we go on with research on allocation of multiple task flows,and propose a Q-learning based distributed and self-adaptable algorithm.This algorithm can not only adapt to task arrival process on itself,but also fully consider the influence from task flows on other agents.Besides,its distributed property guaranteed that it can be applied to open multi-agent systems with local view.Reinforcement learning makes allocation adapt to system load and node distribution.It is verified that this algorithm improves task throughput,and decreases average execution time per task.As for coordination in multi-agent systems,related work can be divided into three parts,which are colony mental state models,multi-agent planning,and social laws. Each of them has their own advantage and effectiveness.Our work on this problem extends research on multi-agent planning.However,plans by our two models means action selection policy for achieving a certain target,rather than a series of actions in the traditional manner.Stochastic policy makes plans more flexible. Multi-agent learning is a promising method in obtaining action policy.In this dissertation we analyze conflict game which implies a competing relationship between agents arising frequently in multi-agent domains,define agent’s optimal responding policy based on Nash equilibrium of matrix games,and then find such policy using reinforcement learning a model-free method.Policy by this model dramatically brings down the frequency of conflicts,enhancing coordination of agents’ behaviors.Furthermore,in the view of long-term utility,policy is fair to some extent,in favor of system stability.In general-sum games,many algorithms are likely to be exploited and consequently acquire less utility.After examining time-related policy and adaptability,we believe that dynamic policy with the two important attributes helps agents make more rational decisions and responses maximizing their payoff,avoiding risk of being exploited in mixed multi-agent environment.Once agents are deployed and applied in large scale,agent society will become a special multi-agent system.Social attributes of agents become more and more important.Apart from mental states such as belief,desire,intention,personality will also play an important role in agent action selection.Modeling other agents on account of their personality benefits making more harmonious policy.Under this background,we put personality into action selection of agents,and based on qualitative decision theory build an individualized action selection model.Different qualitative decision making principles correspond to different personality.Selection according to these decision making principles leads to diversity of agents’ actions. Furthermore,considering complexity and hardship on description of personality,and advantage that artificial neural network is capable of depicting functions difficult to understand,therefore a new individualized action selection model is proposed based on neural network.Compared with the one based on qualitative decision making theory,this model has stronger ability to describe personality,from extreme to subtle types.Besides,a simulation platform for multi-agent system is developed using tool kit SWARM which aims to model complex adaptive systems.And application of personality is investigated by a practical instance,making significance and realistic value of personality more explicit.Although the principle behind these models is simple,it is a new attempt and elementary exploration in another way to research on mental states of agents except for traditional symbol logic,making it possible to reflect chaos and complexity of agent society from multiple aspects.In sum,with cooperation and coordination in multi-agent systems as research subject,through broad investigation and deep exploration,this dissertation proposed several beneficial models and algorithms on task allocation,learning based behavior coordination,and individualized action selection as follows.Algorithm on task allocation adaptable to network topology among cooperative heterogeneous agentsAlgorithm on dynamic task allocation of multiple task flows based on Q-learningMechanism on reinforcement learning for multi-agent conflict game based on regret valueMechanism on reinforcement learning of dynamic policy for general-sum game under mixed multi-agent environmentModels on agent individualized action selection based on qualitative decision theoryModels on agent individualized action selection based on artificial neural network

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2009年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络