节点文献

多智能体机器人系统中的若干通信技术研究

Key Communication Technology in Multi-Agent Robotic System

【作者】 刘海涛

【导师】 洪炳镕;

【作者基本信息】 哈尔滨工业大学 , 计算机应用技术, 2007, 博士

【摘要】 利用通信提高多智能体机器人系统协调控制的性能是近年来多机器人和多智能体领域中的研究热点之一。如何通过通信进行信息共享对于合作与协调至关重要,本文介绍了多智能体机器人系统的通信方式,对合作中通信方面当前关注的主要研究内容和方法进行了系统地总结和综述,对比和分析了近年来基于通信的分布式控制系统典型的建模方法。在此基础上,对基于通信的多智能体机器人系统协调控制中的若干关键问题进行了较深入的分析和研究。具体包括以下几方面研究内容:建立了无偿通信情况下多智能体机器人系统协调控制的集中式模型。将通信代价参数化表示后引入模型,建立了无偿通信时团队协调的集中式控制模型,即无偿通信的存在可以将多智能体部分可观察马尔可夫决策过程(POMDP)简化为单智能体POMDP。为求解带有不确定性的POMDP近似最优策略,提出了一种新的方法,利用结合进化算法的强化学习来估计POMDP的最优解。利用Memetic算法来进化策略,而Q学习算法得到预测奖励来指出进化策略的适应度值。针对隐状态问题,通过记忆智能体最近经历的确定性的有限步历史信息,与表示所有可能状态上的概率分布的信度状态相结合,共同决策当前的最优策略。利用一种混合搜索方法来提高搜索效率,其中调整因子被用于保持种群的多样性,并且指导组合式交叉操作与变异操作。在POMDP典型实例问题上的实验结果证明本文提出的算法性能优于其他的POMDP近似算法。最后针对无偿通信时多智能体机器人系统协调问题进行了有效性实验。无偿通信可以将多智能体POMDP的计算复杂度简化为单智能体POMDP的计算复杂度,然而实际应用中通信不是无偿的,常常期望减少多智能体机器人系统协调所需通信的数量。为此提出了一种新的分散式通信决策算法,利用有向无环图表示团队的可能联合信度,基于此以分散式的方式制定通信决策,仅当智能体自身的观察信息显示共享信息将导致期望回报升高时选择通信。通过维持以及推理团队的可能联合信度将集中式单智能体策略应用于分散式多智能体POMDP问题。通过实验以及一个详细的实例表明,利用我们提出的DAG_DEC_COMM分散式通信决策算法能够有效地减少通信资源的使用,同时提高分散执行的性能。不可靠的通信是众多多智能体实际应用领域的基本特征。有限的带宽、干扰以及视线是通信失败的主要原因。本文在分布式约束优化问题框架下研究了改进的分布式约束推理算法,使其能有效地运行在不可靠的通信条件下。为了减少不必要的通信量,提高算法性能,改进了Adopt算法,使其在保证活性的前提下减少了搜索最优解所需通信消息的数目。此外,分析了引起消息丢失的原因,提出了兼顾两种原因引起的消息丢失的改进方法。结果显示改进后的Adopt算法在通信不可靠时也能保证终止于最优解,并且得到解的时间随着消息丢失概率的增加适度地降低了。近年来多智能体联合作业受到显著关注。人、智能体混合团队得到了广泛应用。本文研究并设计实现了一种基于移动信息设备的多智能体人-机器人混合团队系统。首先提出了一种基于移动信息设备的多智能体人-机器人混合团队系统的体系结构,然后设计并实现了基于移动信息设备的人与机器人之间以及多机器人间的通信系统,实现了团队成员间的信息共享。最后利用实验来验证本文的方法,实验结果表明用户能在自然、便捷的方式下进行人-机器人交互,完成远程监控任务,多机器人通过通信将各个机器人的局部环境模型构建成团队环境模型,有利于提高团队协调工作的性能。

【Abstract】 Considerable attention has been devoted to utilize communication to improve the performance of coordination of multi-agent robotic system in the field of Multi-Agent System and Multi-Robot System. How to share information among multiply robots by communication is a key technique for coordination and cooperation. First of all, three communication methods in the decentralized control system are introduced. Then, the major topics and state-of-the-art of communication in the cooperation are summarized and reviewed. The methods of modelling the communicative decentralized control system are described and their advantages and shortcomings are analyzed and compared. The following are studied further.When communication is free, the central control model for coordination of the multi-agent robotic system is established after the representation of communication cost is parameterized. The presence of free communication reduces the computation complexity of multi-agent POMDPs to that of single-agent POMDPs. In this paper, a novel approximate algorithm, called Memetic algorithm based Q-Learning (MA-Q-Learning), is proposed as a means to solve the POMDP problems which has the uncertainty problems. The policies are evolved using memetic algorithms, whereas the improved Q-learning obtains predictive rewards to indicate fitness of the evolved policies. In order to solve the hidden state problem, historical information is incorporated with the current belief state to aid in finding the optimal policy. Finally, the search efficiency is improved by a hybrid search method, in which an adjustment factor is used to help keep the diversity of population and guide the crossover based on the combination of multiple kinds of crossover and mutation. The experiments conducted on benchmark datasets show that the proposed methodology is superior to other state-of-the-art POMDP approximate methods. Finally, the experiments on the coordination of multi-agent system validate the algorithm’s effectiveness.Although the presence of free communication reduces the computation complexity of multi-agent POMDPs to that of single-agent POMDPs, in practice, communication is not free and reducing the amount of communication is often desirable. In order to reduce the amount of communication in the coordination of multi-agent robotic system, this paper presents a novel approach for making communication decision in a decentralized fashion, and the possible joint beliefs of the team are represented based on a directed acyclic graph. And communication is chosen only when an agent’s local observations indicate that sharing information would lead to an increasing in expected reward. It is described how to apply centralized single-agent policies to decentralized multi-agent POMDPs by maintaining and reasoning over the possible joint beliefs of the team. Experiment and a detailed example show that the proposed DAG-DEC-COMM algorithm can reduce communication while improving the performance of distributed execution.Unreliable communication is a common feature of many real-world applications of multi-agent domains, especially of multi-agent robot system. Limited bandwidth, interference and loss of line-of-sight are some reasons why communication can fail. We introduce an improved Adopt algorithm for operating effectively over unreliable communication infrastructure in the context of the Distributed Constraint Optimization Problem (DCOP). The key idea in our approach is to let the improved algorithm reduce unnecessary communication and an adaptive mechanism of timeout is used in order to ensure the liveness to find the optimal solution. Thus, the number of messages communicated is decreased. Furthermore, the adaptive timeout can allow the algorithm to flexibly and robustly deal with message loss. Results show that with a few modifications, Adopt can be guaranteed to terminate with the optimal solution even in the presence of message loss and that time to solution degrades gracefully as message loss probability increases. The results also suggest that artificially introducing message loss even when communication infrastructure is reliable could be beneficial in terms of the amount of work agents need to do to find the optimal solution.Recent researches focus on multi-agent teamwork. Multi-agent human-robot team has applied to many fields. This paper investigates and develops a system of multi-agent human-robot with mobile information devices. Firstly, this paper presents the architecture of multi-agent human-robot with mobile devices, and designs and implements the communication system between human and robot and among robots. Information sharing is realized among the members of the team. Finally, the results of experiments show that this system is user-friendly and can effectively undertake the remote monitoring and control tasks and robot members of the team can construct the world model by communicating member’s local environment information, which is beneficial to team coordination.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络