节点文献

基于多Agent的突发事件信息智能监测系统研究

Research on Emergency Information Intelligent Monitoring System Based on Multi-Agent

【作者】 王肃

【导师】 杜军平;

【作者基本信息】 北京邮电大学 , 计算机科学与技术, 2011, 博士

【摘要】 突发事件都具有随机性、突然性和危害性的特征。在互联网环境下,突发事件网络信息通过新闻、评论、发贴、回复等形式反映出来,具有传播快捷、信息多元、方式互动等显著特点,这使突发事件信息监测和处理所面临的形势非常复杂和严峻。本文针对突发事件信息的采集、处理、跟踪、分析等关键技术进行了研究,并将这些技术在扩展的JADE平台上进行了基于Agent的实现,使突发事件的信息监测系统具有自动化、分布化和智能化的特点。论文的主要贡献和创新点如下:(1)在分析突发事件信息处理需求基础上,对信息的采集、URL去重、信息抽取等关键技术在分布式环境下的应用进行了研究。提出了聚焦双高网页算法(FDHP),考虑了网页本身的主题相关性和主题质量、网页中URL的可信度因素,该算法能使爬虫采集到高质量、高相关度的主题网页。提出了分段式RP算法(SRP),该算法能够在分布式环境下,高效地完成海量的URL的检索去重工作。提出了标注-清洗-统计-抽取方法(MCSA),可对网页信息进行标签标注与清洗、文字分组统计和内容抽取,有较高的F1值,适用于对不同语言网页内容进行快速清洗和抽取工作。(2)在双高网页的基础上,通过对基于委员会投票选择方法(QBC)的文本分类模型进行分析,提出了扩展QBC方法(EQBC),使未标注数据点能够发挥更大的作用,只需训练少量样本即可得到较好的分类结果,并且有更快的收敛速度。采用不同的分类器分析比较QBC与EQBC两种方法的性能,实验表明EQBC方法具有更好的分类结果,可以得到主题质量更优、相关度更大的突发事件主题网页。(3)在相关主题跟踪的基础上,对突发事件信息进行了分析,给出了突发事件情景的七元组定义,能够有效地描述和记录突发事件的数据、与环境交互、参与者、行为列表等特征。情景分析框架应包括情景获取、表示、映射和使用四个功能。建立了规则与情景-本体-数据模式映射模型(RSODMM),给出了情景的逻辑关系和条件关系定义以及情景分析框架的组成和处理流程,最后用案例和实验验证了情景分析框架的有效性。(4)提出并建立了基于多Agent的突发事件信息智能监测原型系统。在突发事件信息采集与处理、主题检测与跟踪、情景系统等领域进行了具体实现。在基于Agent的分布式信息采集系统中,设计并实现了基于Agent的分布式爬虫,可以采用基于关键字和基于双高网页的爬取策略,满足用户对突发事件信息的不同要求。实现了基于主题关键字词典的双语信息检测和基于时间顺序的主题跟踪系统。在情景固化并向情景系统实现时,提出了Agent与情景结合的实现方法,采用Agent角色分析方法实现了突发事件情景系统。(5)通过对JADE平台进行扩展开发,将相关的突发事件智能应用系统整合起来,实现在更大平台下的分布式运行和部署,将系统结构分为五层。通信传输层、系统容器层和Agent服务层构成了对多Agent系统应用程序的支撑环境,Agent应用层用作智能应用程序的整合,用户接口层负责将用户的请求转化为系统能够理解的命令,并由Agent应用层进行执行。采用分层机制的JADE支撑平台有利于对应用程序层进行扩展,并且支持多Agent程序的分布式运行和管理。

【Abstract】 Randomness, suddenness and harmfulness are the features of emergency. In the context of Internet, emergency network information is reflected through the approaches of news, comment and posting, as well as reply, with obvious characteristics such as spreading rapidly, having diverse information sources and being interactive, which brings complexes and challenges to the monitoring and processing of the emergency information. Thus, this dissertation studies on the detection, process, tracking and analysis of the emergency information monitoring, and realizes these technologies based on agent through the extended JADE platform. In this regard, the emergency information monitoring system turns to be automatic, distributed and intelligent. The main contributions and innovations of this dissertation are as follows:(1) Based on the analysis of the emergency information requirements, the key technologies, such as data collection, URL redundancy removing, text content abstraction, are researched under the distributed environment. Focusing double-high pages algorithm (FDHP) is proposed, the way which the crawler can collect double-high topic web pages considering the topic relevance and quality of the web page, as well as the URL confidence in the web page. Sectional RP algorithm (SRP) is also proposed to improve the original RP algorithm efficiency. SRP works well to retrieval and remove the huge amount of redundant URLs under distributed environment. Marking, cleaning, statistics, and abstraction method (MCSA) is proposed to mark the elements in the web page, clear the tags, statistics the text groups and abstract the text content. Using MCSA can get the right result and high F1 value from the web pages of different languages. The algorithm is also adapted to carry out the multi-language web page clearing and abstraction job in distributed case.(2) On the basis of double-high web pages, by analyzing the Query by Committee (QBC) classification model, the Extension of QBC approach (EQBC) is presented, which can make the unlabelled data spot play a more important role and get sound categorized results by less samples but with faster convergence speed. The experiment of comparing the attributes of the QBC method and the EQBC method through different classifier shows that method of EQBC gets a better categorical result with better quality and higher relevant emergency topic web page.(3) On the basis of topic tracking, the emergency information is analyzed and the seven-tuple definition of emergency scenario is proposed, which effectively describes and records the features of emergency, such as data, interaction with environment, participant and behavior list, etc. It indicates the four functions of scenario analysis frame, including scenario getting, expressing, reflecting and using. The model of rule and scenario, ontology and data scheme mapping (RSODMM) is proposed. The logical relations and conditional relations are defined, and the composition and process procedure of scenario analysis frame are also elaborated. In the end, the effectiveness of the scenario analysis frame is verified by case and experiment.(4) It proposes and sets up intelligent emergency information monitoring prototype system based on multi-agent. Emergency information collecting and processing, topic monitoring and tracking, and scenario system are applied and realized. In the distributed system of information collecting based on Agent, distributed crawler based on agent is designed and realized to meet the different requirements of users on emergency information by crawling strategy based on key words and double-high web pages. Bilingual information monitoring based on key words dictionary and time-based sequence topic tracking system are realized. The implementation of the integration of agent and scenario is proposed when the scenario is frozen. The emergency scenario system is realized by the approach of agent role analysis.(5) Through the extended development of JADE platform, relevant emergency intelligence system is integrated, which operates and deploys on a larger platform. System structure is categorized by five levels, with telecommunication transmission level, system container level and agent service level composing a supporting environment to multi-agent system application, agent application level integrating intelligent application, user interface lever transferring the request of user to system understandable order, and agent application level doing the execution. The layered JADE supporting platform is favorable to the extension of application layer and supports multi-agent distributed operation and management.

  • 【分类号】TP18;TP274
  • 【被引频次】5
  • 【下载频次】649
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络