节点文献
网络舆情热点信息发现及其倾向性研究
The Research on the Discovery and Polarity of Hot Topics of Public Opinion
【作者】 李海林;
【导师】 聂规划;
【作者基本信息】 武汉理工大学 , 国际贸易学, 2010, 硕士
【摘要】 随着信息技术的发展和互联网的日益普及,网络已经成为广大民众获取信息的主要渠道,同时网络也成为人们发表评论、表达民意的重要平台。面对互联网上飞速增长的新闻话题以及人们的评论信息,如何从海量信息中采集到满足特定需求的信息,如何将互联网信息组织整理成有效的机器数据,如何从采集到的数据中区分有用信息和无用信息等等这些问题都是信息科技发展所面临的难题。网络舆情是指民众通过互联网对政府管理以及现实社会中各种现象、问题所表达的政治信念、态度、意见和情绪的总和。网络舆情与社会舆情相互作用、相互影响。两者不仅在内容表现形态方面具有一致性,同时网络舆情在一定程度上会影响社会舆情的发展趋势,对社会影响巨大。因此,政府部门对网络舆情信息必须具备一定的监控能力,能够及时掌握一定时期内民众所关注的热点问题,了解民众对热点事件的看法和态度,从而做出正确的决策,主动引导舆论走向。本文在分析网络舆情热点信息发现和网络舆情热点信息倾向性研究现状的基础上,从舆情信息的来源入手,设计了详细的采集流程。针对大众和政府部门都比较关注的热点信息,本文根据热点信息的概念和特征建立了热点信息的判断标准,并将热点信息的特征定量化,构建数学模型,用算法来描述热点信息的发现和获取。针对热点信息的倾向性分析,本文首先手工构建了极性词典,并对极性词典进行了扩充和修正,将未登录词汇、否定词和强调副词对原始极性词的影响做了进一步分析,并提出相应的解决办法。对于普通的文本信息,用向量来进行表示,通过计算特征词的权重来选取文本的特征词条。由于中文句子以标点符号进行划分,本文对句子进行句法分析,解析出词语之间的依存关系,并对词语进行词性标注。本文建立了语义模板,通过语义模板的匹配来确定句子的语义模式,利用极性词典计算出词语的极性值,再结合句法分析和模式匹配得出其上下文极性。句子的倾向性由组成句子的主题词和极性词及其极性值决定,文本的倾向性由句子的倾向性和句子在整个文本中的权重计算得出。最后,本文对所做的研究工作进行了模拟实验,对实验结果进行了讨论与分析。
【Abstract】 With the development of information technology and the growing popularity of Internet, the network has become the main channel for general public people to get the information, as well as an important platform for expression of public opinion. At the face of rapid growth of news information and people’s comments on the Internet, how can we get the information which meets the specific needs from the mass information? How to organize Internet information into an effective machine data? How to distinguish the useful information and useless information from the collected data? All these problems are difficult at the process of the development of information technology. Public opinion is the sum of political beliefs, attitudes, opinions and emotions about the government administration, as well as the variety of phenomena in the real world which are expressed by general people through the Internet. The Internet public opinion and the social public opinion are interaction and affect each other. The Internet public opinion and the social public opinion has a consistent on the content, The Internet public opinion to a certain extent, will affect the community development trends of social public opinion, and will have a huge impact on the community. Therefore, the Government needs to have some information on the network to monitor public opinion, and the ability to grasp a hot issue which the general people concern on the certain period of time, understand the attitudes and views of hot events in order to make the right decisions, and take the initiative to guide public opinion towards.Based on the analysis of the discovery on public opinion hotspot information and the research on tendency analysis of public opinion, this paper designs a detailed collection process from the source of public opinion. For the hot information which is concerned by the general public and government departments, this paper has established criteria for judging hot information according to the concept and characteristics of hot spots, and quantitative characteristics of hot information to build mathematical model, using algorithms to describe the discovery and access of hot spot information. To the tendency analysis of hot information, first of all this paper hand-built the polarity dictionary, and the polarity dictionary was expanded and amended, then have the further analysis on the no-logged vocabulary, the negative words and stressed words to the impact of the polarity on the original word, and give the solutions. This paper uses vectors to carry out the ordinary text messages, and selects the characteristics words of the text by calculating the weights. As the Chinese sentence is divided by punctuation, this paper carried a sentence parsing, parsed out the dependencies between words, and tagged the part of speech. This paper built the semantics template, and determined the sentence semantic model through the matching to semantic template, calculated the polarity value of the words using the polarity dictionary, got its context polarity by combining with syntactic analysis and pattern matching, the tendency of sentence is determined by the composition of the sentence and the polarity value of the words, the tendency of the text is calculated by the tendency of sentence and the weight of the sentence in the whole text. Finally, this paper made simulation experiments about the research work, discussed and analyzed the experimental results.
【Key words】 Internet public opinion; Hot topic; Polarity word; Text feature;