

Topic Tracking of Accidental News Based on SVM

【作者】 王强

【导师】 张永奎;

【作者基本信息】 山西大学 , 计算机应用技术, 2009, 硕士

【摘要】 移动互联网的发展使得人们进入了一个信息极度丰富的时代。网络信息规模的急剧膨胀和凌乱无章,又使得人们对有价值信息的发现和管理变得越来越困难。突发事件的随机性和不确定性,使得决策者掌握的信息有可能不全面和不及时,并且在信息的反馈和处理过程中,信息的准确性和有效性也难以保证,导致信息失真。如何能全面准确地获取相关报道和突发事件的发展演变信息成为目前需要解决的问题。话题检测技术能从新闻报道流中自动检测出最新的新闻主题,并将新闻报道及时地按照话题组织起来;话题跟踪技术则能追踪特定的新闻主题。因此,话题检测和跟踪技术的应用将能有效地管理和组织新闻信息,满足人们对新闻信息的特殊需求。本文对突发事件的后续报道进行跟踪,根据用户事先确定的感兴趣的话题,对大规模的海量信息进行实时过滤,生成相关话题的持续进展情况,进而掌握事件的全貌。本文采用构建多个子向量的多向量空间模型的方法来表示突发事件新闻文档。在对常见的文本分类算法分析的基础上,采用了基于SVM分类算法的方法实现了话题跟踪系统。针对话题跟踪过程中话题本身的漂移现象,提出了改进的话题跟踪系统,对跟踪过程中伪相关反馈包含的新颖信息进行检测和建模,并在此基础上使用多向量空间模型动态调整话题空间,以跟踪话题漂移,降低漏检率。本文的主要工作有:1.对已经下载加工好的突发事件新闻语料进行分析,采用词语作为候选特征并将特征词划分为五类(人名、时间名、地点名、组织机构名、内容)并形成五个子向量,用五个子向量空间模型来表示新闻文档。计算时间相似度和地点相似度计算的时候分别采用了报道时间距离和关联度的计算方法,同时在特征词的权重计算时考虑了特征词的位置信息。最后把突发事件文本的信息分为两类,即客观信息和主观信息,为进一步研究奠定理论基础。2.在报道关联检测中,采用了多向量模型构建和基于SVM的分类算法相结合的方法进行检测,取得了较好的效果。3.针对话题跟踪过程中话题本身的漂移现象,采用改进的基于核心和新颖部分的方法构建了话题跟踪系统。4.设计了一个可以实现报道关联检测和话题跟踪的实验系统,能够较好的识别既定话题的后续报道。最后,我们从收集加工好的突发事件新闻语料中选择了10个话题共260篇报道进行了对比测试,来验证我们提出的方法的可行性和有效性。实验结果表明本文所提出的方法在一定程度上提高了突发事件话题跟踪系统的效率。

【Abstract】 Mobile wireless Internet makes it a very rich era of information. However, network expansion and messy drama without chapters, makes the discovery of valuable information and management become difficult. Because of Accidental events randomness and uncertainty, decision-makers may not available to comprehensive. In the information feedback and processing, information accuracy and effectiveness can not guarantee, resulting in distortion of information. How we can access to comprehensive and accurate reports of Accidental events and the evolution of that need to be addressed now.Topic detection can identify new topics in a stream of news stories and organize the news stories by topic. Topic tracking can track the given topics and obtain the relevant news stories in the news stream.so applying the topic detection, tracking techniques into the model will manage the information effectively. We track the sequential story of accidental event based on the certain topics people interested in ,which let people know the latest evolution of the event.We build a muti-vector space model for the Accidental events. By analysis text classification algorithm, we apply SVM classification algorithm into topic tracking. To find and track topic shift in topic tracking task, this paper proposes the improved topic tracking system, which detects the novelty information in topic tracking feedback and modifies topic model based on VSM, in order to track the topic shift effectively.The main work in this article:(1) By analyzing the processed corpus, we divided the text of the incident information into two types, objective information, and subjective information. And the use of the term will be characterized as a candidate feature words is divided into five categories (name, time, and place names, organization names, content) and the formation of the five sub-vector, with five sub-vector space model to table the document information, the location information word is special consideration when Weight calculation .(2) Link detection, based on the combination of multi-vector model and the SVM classification algorithm, which achieved good results.(3) To resolve the topic shift in topic tracking task, we build a topic tracking system based on improved core and innovative models.(4)We designed an experimental system to achieve topic link detection and topic tracking, It can track the sequential story of accidental news effectively. Finally, we use 10 topics from accidental news corpus, about 260 stories .The result shows that the method can improve the efficiency of tracking accidental events in a certain way.

  • 【网络出版投稿人】 山西大学
  • 【网络出版年期】2011年 S1期

