节点文献
网络舆情敏感话题发现平台的研究
Research on the Detection Platform of Sensitive Topic in Internet-Mediated Public Sentiment
【作者】 冯颖;
【导师】 孟嗣仪;
【作者基本信息】 北京交通大学 , 通信与信息系统, 2009, 硕士
【摘要】 互联网作为重要的交流渠道,其存储和传输的信息,尤其是一些敏感话题,对于大众舆论的形成和传播有着举足轻重的影响,其潜在的安全威胁也是不可估量的。因此,敏感话题主动发现技术已经成为一项紧迫而又重要的课题。网络舆情敏感话题发现平台围绕着网络信息分析和处理中的各项关键技术,主要是对预处理后的网络信息进行分词和结构化存储及在此基础上的敏感话题发现技术,进行了系统的研究。论文设计并实现了基于网络信息分词结果与敏感词库匹配的网络舆情敏感话题发现平台。针对中文网络舆情敏感信息的分词,本系统实现了基于层叠隐马尔可夫模型的中文词法分析方法,将中文分词、切分歧义排除、未登录词识别和词性标注整合到一个框架中。对敏感词库的管理,通过链表和序列化方式保证敏感词库的完整性和可传递性。关于敏感话题的发现,采用逆向思维的识别过程,将处理后的话题与敏感词库匹配,即将分词结果在敏感词库中查询并识别出敏感话题,从而提高了敏感话题的识别发现效率。基于以上工作,对提高敏感话题发现平台的性能上进行了以下几点探索:通过实验比对完全二阶隐马尔可夫模型(FHMM2)与隐马尔可夫模型(HMM)的分词准确率与召回率,得出FHMM2在统计效果和精确率上有着明显的优势;对现有分词词典的改进提出了基于四字Hash机制的分词词典;在基于语义的敏感话题发现方面,提出了基于关键词和隐性语义标引的敏感词识别和敏感度评测方法。本论文基于以上的工作,最终设计并实现了网络舆情敏感话题发现平台,在实验室范围内测试,并经校园网内部试运行,结果证明此系统运行稳定,效果良好。
【Abstract】 As an important communicating channel,the information carried and transmitted by the internet,especially the sensitive topics,seriously influences the formation and dissemination of public opinion,and it poses inestimable latent security threat.Therefore,the initiative detection technology of sensitive topic is urgently needed.The Detection Platform of Sensitive Topic in Internet-mediated Public Sentiment conformed to main techniques of network information analysis and processing,completed segmentation and the structured storage of processed network information,and realized the detection of sensitive topics in Internet-mediated public sentiment.This thesis designed and realized the detection platform based on the match of segmentation results and sensitive words.To the word segmentation module of the system,this paper brings forward an approach for Chinese lexical analysis using Cascaded Hidden Markov Model(CHMM),which aims to integrated Chinese word segmentation,disambiguation,unknown word recognition and part-of-speech tagging into one theoretical frame.Then the system realized the sensitive word management through the single link data structure and the serializing way,thus ensuring the integrity and transitivity of the database.To the detection of the sensitive topic,with a thoroughly retro perspective,the system matches the processed topics with the sensitive words,and that is,inquiring the data table of sensitive topics with the segmentation results and then distinguishes the sensitive topic,and this method increases the efficiency of the detection.Base on the work above,this thesis makes a preliminary exploration to improve the capability of the system,which includes the following:compared the recall rate and the precision of segmentation using Full Second-order Hidden Markov Model(FHMM2) and Hidden Markov Model(HMM) through the experiment,the paper comes to an conclusion that FHMM2 has an obvious advantage in the statistics effectiveness and accuracy;based on the improvement of existing segment dictionary,it put forward a segment dictionary basing on Four-character Hash Mechanism;aiming at detecting the sensitive topic using semantic information,it present detection of sensitive topic and evaluation of sensitivity using Latent Semantic Indexing and key words.Summing up all the work,the paper designed and realized the Detection Platform of Sensitive Topic in Internet-mediated Public Sentiment.Processing a testing run in then environment at the laboratory and in campus network,the system turned out to be efficient and stable.
- 【网络出版投稿人】 北京交通大学 【网络出版年期】2009年 11期
- 【分类号】TP391.1
- 【被引频次】22
- 【下载频次】1561