节点文献

动态不确定性环境下的实时规划系统研究

【作者】 李响

【导师】 陈小平;

【作者基本信息】 中国科学技术大学 , 计算机应用, 2004, 博士

【摘要】 作为一种非常重要而且常见的智能行为和能力,规划(Planning)就成为人工智能研究的一个重要领域,很早就受到关注的主要问题之一。而在动态不确定性环境下的规划就因其更加贴近现实环境,具有更高的实用价值而成为目前规划问题研究的重点和热点。本文首先分析动态不确定性环境的主要特点,包括:■动态性:环境的状态无时无刻不在变化。它不仅仅受智能体自身的影响而变化,还受环境中其他智能体和其他因素的影响而变化。■智能体知识的局限性:一般来说,智能体不可能掌握环境中所有的知识,不可能了解可以引起环境变化的所有因素,不可能了解其他智能体的所有情况。智能体只可能部分的掌握这些知识,甚至对一些方面一无所知。■智能体行动的不确定性:智能体在环境中执行一定的行为,其结果是不确定的,事先无法对这个结果作准确的预测。■智能体观察的局部性:一般来说,智能体对环境的观察是不全面的。在同一时刻,智能体只能观察到环境中一部分的情况。■智能体观察的不确定性:智能体从环境中得到的观察一般来说是不准确的,有时甚至是错误的。然后,对现有的规划系统在适应上述动态不确定性环境的能力进行了概述。分析了这些系统在适应动态不确定性环境方面各自的优点和不足。本文的主要工作是基于以上的分析和认识,提出了基于PRS和决策论规划的面向动态不确定性环境的规划系统POMDPRS。并讨论了两种提高决策效率的改进方法。具体工作主要有:1)提出了面向动态不确定性环境的规划系统POMDPRS。描述了其基本模型,并给出了形式化描述。POMDPRS通过保持PRS系统的持续规划机制来适应环境的动态性,通过使用环境状态空间上的概率分布作为智能体的信念来适应环境的不确定性,从而兼顾了两个大方面的要求。2)阐述了状态因子化表示在POMDPRS中的应用,并给出了因子化的POMDPRS——FPOMDPRS的形式化描述。POMDPRS使用环境状态空间上的概率分布作为智能体的信念,并根据智能体输出的行为和接收到的观察来对其进行更新。但是在很多情况下,状态空间往往十分巨大,从而使得信念更新的时间消耗非常高,难以适应系统反应实时性的需要。因子化方法通过将状态表示中涉及到的环境属性根据其互相依赖关系来对它们进行划分。将一个状态表示为几个子状态的集合,从而将未因子化时的一个大状态空间变成几个较小的状态空间。从而信念也就变成几个子状态空间上的概率分布的集合。在信念更新的时候,对这几个子状态空间上的概率分布分别处理,从而达到削减信念分布时间消耗的作用。3)阐述了Monte Carlo滤波表示在POMDPRS中的应用,并给出了应用MonteCarlo滤波的POMDPRS——MCPOMDPRS的形式化描述。削减信念更新的时间消耗的另一个方法是Monte Carlo滤波。它通过使用概率分布上有限的一些具体数值(样本)来代表整个分布,并根据行动和观察,使用SIR方法来对这个样本集进行更新。这使得信念更新的时间消耗依赖于样本集的大小。从而可以通过控制样本集的大小来控制信念更新的时间消耗。因子化和Monte Carlo滤波可以在POMDPRS中结合起来使用。即先对状态进行因子化,然后再对一些仍然很大的子状态集使用Monte Carlo方法,从而达到进一步提高信念分布更新效率的目的。本文在最后具体描述了一个FPOMDPRS和MCPOMDPRS相结合的,在实体机器人上运行的机器人决策控制系统P-DOG并给出了实验结果,验证了POMDPRS及其变种的可行性。

【Abstract】 As a very important and familiar intelligent activity a capability, planning is one of the key fields in artificial intelligence research. It had received attentions very early before. Planning in dynamic nondeterministic environment becomes the focus and hot spot, since this environment is more real and the research on it is more valuable.In this thesis, main characters of dynamic non-deterministic environment are analyzed in the beginning. They areDynamic environment - The environment always keeps changing. It is affected not only by the agent itself but also by other agents and other factors in the environment.Limit knowledge of agents - In general, any agent has not all knowledge of the environment it lives. It can’t know every factor which can affect the environment. And it also can not know other agents completely. One agent can only hold a part of them and even be utterly ignorant in some fields.Nondeterministic result of actions - Agents can perform some actions in the environment. But the results of these actions are not deterministic and unpredictable.Partial observation -- In general, agents’ observations on the environment is partial. At each time, an agent can only observe a part of the situation of the environment.Nondeterministic observation -- agents’ observations on the environment could be not accurate and even be completely false.Then, existing planning systems are analyzed in the capability of adapting in dynamic nondeterministic environment. Advantages and shortcomings are pointed out.Besides analyzing above, the main work of this thesis is introducing a new planning system POMDPRS which is better on adapting dynamic nondeterministic environment. It’s based on PRS and decision theory planning. This thesis also discusses two approaches to improve efficiency of decision making. Abstract details are as below1) A new planning system POMDPRS which is better on adapting dynamic nondeterministic environment is introduced. The basic model and formal description are provided. POMDPRS reserves the continued planning mechanism to adapt dynamic character and depicts the belief of an agent with a distribution over the state space to adapt nondeterministic character. Therefore, POMDPRS satisfied requirements on both sides.2) The factorial depiction of states in POMDPRS is introduced. And the formal description of factorial POMDPRS - FPOMDPRS - is provided. POMDPRS depicts the belief of an agent with a distribution over the state space, and updates the belief with actions the agent performed and observations it received. Unfortunately, in many cases, the state apace is very large. This makes time consuming of belief updating very high. It does meet the real-time requirement of system. Factorial approach divides the properties of the environment, which construct states, into groups according to their dependency relations. This makes a big state space change into several smaller sub-state spaces. Therefore the belief is also transform into several distributions over sub-state spaces. In this way, belief updating is performed in these sub-state spaces independently. The time consuming of belief updating is reduced.3) The Monte Carlo filter in POMDPRS is introduced. And the formal description of Monte Carlo filter POMDPRS - MCPOMDPRS -- is provided. Monte Carlo filter is another way to reduce time consuming of belief updating. It use limit number of values (known as samples) to descript the whole distribution. And this sample set is updated by SIR method with actions and observations. It makes the time consuming depend on the size of the sample set. Therefore, we can control the size of sample set to limit time consuming of belief updating.Factorial state and Monte Carlo filter can be combined to apply in POMDPRS. First, state is factorized. Then, Monte Carlo filter is introduced into the sub-state sets which are still large in size. In this way, the efficiency of belief updating can be more improved. This thesis also describes a robot control system called P-DOG which runs on real robots. It’s an instance of the combination of FPOMDPRS and MCPOMDPRS. This system on real robot proves the feasibility of POMDPRS and its variations.

  • 【分类号】TP18
  • 【被引频次】1
  • 【下载频次】308
节点文献中: 

本文链接的文献网络图示:

本文的引文网络