节点文献
网络舆情信息挖掘关键技术研究与应用
The Research and Application of Key Techonology in Information Mining of Network Public Opinion
【作者】 戴霖;
【导师】 厉小军;
【作者基本信息】 浙江工商大学 , 计算机应用技术, 2011, 硕士
【摘要】 随着互联网的高速发展,越来越多的人通过网络来表达自己的意见、想法、情绪和态度,其中既包括对事件的发展有着正面、积极作用的信息,也包括一些负面、消极的信息。同时,网络平台的开放性、直接性和隐蔽性使得网络舆论越来越重要地影响人们的意识形态。因此,对大量舆情信息的及时有效挖掘,对维护社会稳定、促进国家发展具有重要的现实意义。网络舆情信息挖掘与自然语言处理技术密切相关。受限于自然语言处理技术水平,传统的网络舆情信息挖掘,主要为话题识别的相关内容,而对舆情的情感因素关注较少。近年来,浅层语义分析开始出现,并在相关应用研究中体现出相对词性标注、句法分析更为智能实用的优势。浅层语义分析是一种简化了的语义分析形式,以动词为中心对句子意义进行了形式化表示。结合相关自然语言处理技术,基于对现有舆情信息分析算法的对比分析,本文对舆情信息挖掘技术进行了研究与实验,并将其成果应用在网络舆情监控分析系统中。本文主要内容有:(1)自然语言处理技术介绍。考虑到自然语言处理技术在网络舆情信息挖掘中的重要作用,本文在第2章对该技术的关键部分进行了简述。(2)舆情热点话题识别技术研究。基于ICTCLAS分词与词性标注,提出一种结合文本关键词提取和文本聚类的热点话题识别方法。舆情信息的即时性导致未登录词分词错误率较高,利用词语共现概率对分词结果进行拼接,能有效改善未登录词分词性能。文本关键词提取则将词语位置权重信息纳入考虑范畴。(3)舆情文本倾向性分析技术研究。结合语义角色标注一种浅层语义分析和情感词库建设,实现文本倾向信息挖掘。通过对语义角色标注样本的统计分析,得到角色-特征性概率表和角色-情感性概率表,为角色抽取顺序选择提供数据支持。情感词库建设采取人工标注和自动扩充相结合方式,通过对基于字的情感词倾向计算的实验,得到一种改进后的情感词库自动扩充方法。(4)舆情监控分析系统设计与实现。根据网络舆情信息的特点,提出系统总体框架,并对系统主要模块进行了简要介绍。本文所涉工作在网络舆情监控分析系统中得到应用,可有效辅助舆情监控,减少人为干预,必将在未来的网络信息管理中发挥积极的效益。
【Abstract】 Along with the rapid development of the Internet, more and more people express their opinions, ideas, feelings and attitudes through network, which include positive information boosting the development of events, also include some negative information making the events more badly. At the same time, the openness, directness and concealment of network make it influence the people’s ideology more importantly. Therefore, extracting huge network information timely and effectively has practical significance in maintaining the social stability and promoting the national development.Network public opinion information mining is closely related to the Natural Language Processing (NLP) technology. Because of the limited NLP technology, traditional information mining mainly solves the topic recognition and relevant content of it, but pays less attention to the emotional factor in public opinion. In recent years, shallow semantic analysis starts to emerge, and performs more intelligently and practically in related application and research compared to part-of-speech and syntactic analysis. Shallow semantic analysis is a simplified semantic analysis, which represents the meaning of a sentence centering on the verb. Based on the comparative analysis of existing public opinion monitoring algorithms, this paper researches and experiments the mining technology of public opinion through related NLP technologies, and applies the mining technologies in the monitoring system of public opinion in network. This paper includes the following main contents.(1) The presentation of NLP technology. Considering the importance of NLP in public opinion information extraction, this paper briefly introduces several key technologies of NLP in chapter 2.(2) Research of public opinion hot topic recognition. This paper puts forward a novel method combing text keywords extraction and text clustering after the text was segmented and labelled as part-of-speech tagging making use of ICTCLAS. The real-time character of public opinion makes the high error rate of unknown words’segmenting, so this paper uses co-occurrence probability among words to joint words with higher probability in order to improve the segmenting result of unknown words. The weight information of location is also taken into consideration in keywords extraction.(3) Research of public opinion tendency analysis. Combing the Semantic Role Labeling (SRL) which is a kind of shallow semantic analysis and emotional lexicon construction, this paper realizes text tendency information mining. Based on statistical analysis of SRL samples, the role-feature and role-emotional probability tables are acquired which provides support for the sequence choice of role extraction. Emotional lexicon construction combines human labeling and automatic expanding. Through several experiments on emotional words’tendency calculation based on characters, an improved lexicon automatic expanding method is obtained.(4) Design and implementation of monitoring and analyzing system of public opinion in network. According to the character of public opinion, the paper introduces the system frame and some main modules.The related tasks in this paper have been applied in monitoring and analyzing system of public opinion in network, and it can effectively monitor network public opinion to reduce human intervention. It will play a positive benefit in future network information management.
【Key words】 network public opinion; information mining; monitoring and analyzing; hot topic recognition; text tendency analysis; semantic role labeling;