节点文献

网络舆情分析关键技术研究与实现

【作者】 吴娱

【导师】 佘堃;

【作者基本信息】 电子科技大学 , 计算机软件与理论, 2011, 硕士

【摘要】 随着计算机技术和通信技术的飞速发展,互联网己成为了人们生活的不可缺少的组成部分。据国际电联统计,截止至2010年12月,全球互联网用户总数已经超过20亿。其中,我国的互联网用户数量已经超过3.9亿。网络被公认为是继报纸、广播、电视之后的“第四媒体”,民众知情权、表达权、参与权、监督权在互联网上已基本得到落实。网民对企业、民生、政府管理、反腐败、社会道德等热点问题在互联网上踊跃发表意见,这些意见形成一种强大的舆论压力,其影响已经大大超过了传统媒体。网络已经成为反映社会舆情的最主要载体。在网络舆情迅猛发展的同时,对网络舆情分析监控工作显得愈发重要。网络具有的开放性和相对自由的宽松度,使得民众发言摆脱了社会权利体制的管制和限制,可以畅所欲言无所顾忌的表达个人的观点、立场、情绪,民意表达更为畅通。网络也由于其虚拟性也带来了很大的安全隐患,发言者身份隐蔽,并且缺少规则限制和有效监督,因此网络很容易成为一些网民发泄不良情绪的空间。而且由于目前我国正处于社会转型期,存在诸多矛盾,再加上少数社会管理者对于舆论习惯性的回避或堵塞,因此,非常需要使用舆情分析系统对网络舆情进行分析监控,及时防范误导性舆论造成的社会危害,把握和保障正确舆论的前进导向,为构建和谐社会的舆情保驾护航。本论文对网络舆情分析系统进行需求分析,提出了系统的设计方案,并实现了系统中网页文本分类和文本倾向性分析等关键技术。本论文的先进性表现在:1)针对现有的通用爬虫技术存在的局限性,提出了一种基于爬行策略和过滤策略的数据采集方法,过滤大量无用信息;同时制定了针对舆情分析系统的网页库更新策略,保证本地网页库的时新性。2)通过对基于朴素贝叶斯的网页文本分类技术进行研究,提出了一种基于粗糙集改进的朴素贝叶斯分类方法,并将该方法运用到舆情分析系统的舆情分类中。3)通过对现有的基于语义和基于机器学习的文本倾向性分析技术分别进行了探讨,并结合两类方法的优点,提出了一种基于语义改进的机器学习文本倾向性分析方法,并将该方法成功的应用到舆情分析系统中。

【Abstract】 With the rapid development of computer technology and communication technology, the Internet has become indispensable part of people’s lives. According to the ITU statistics, up to December 2010, the total number of global Internet users was more than 2 billion. And the number of Chinese Internet users was more than 390 million. Internet has been recognized as "the 4th media" after newspaper, radio, and TV. The public right to know, to express, to participate, and to supervise has been implemented on the Internet. Internet users commented on people’s livelihood, government management, anti-corruption, social morality and hot issue enthusiastically, which has formed a kind of strong pressure of public opinion. The influence of Internet has exceeded the traditional media, and has become the most important carrier of social public opinion.In the rapid development of public opinion on the Internet, analysising and monitoring public opinion of Internet has become increasingly important. Internet with open and relatively free of loose degree, allows people to speak out of control and restriction of the system of social rights. People can open up broad expression of personal opinions, positions, emotions, and expression of public opinion is clearer. The virtual Internet poses a significant security risk, speaker identity conceals, and lack of rules limits and effective oversight. So Internet can easily become the space for some Internet users to vent negative emotions. Our country has long been in isolation, which is vulnerable to external ideological culture. And our country is in the social transition, so the contradictions exist. And a few social managers are used to avoid or congestion public opinion. Therefore, public opinion analysis system is needed for network monitoring, timely preventing social harm caused by misleading opinion, grasping and safeguarding right opinion forward oriented, for constructing harmonious society public opinion.In this thesis, requirements of public opinion analysis system were analysised and system was designed. Web text category, sentiment classification, and other key technologies were implemented. Advanced expressed in this thesis: 1) For the current limitations of existing crawler techniques a data collection method based on crawling policy and filtering policy was put up, which could filter a lot of useless information. An update strategy of a local website database was developed for public opinion analysis system, to ensure that web pages of the local website database were fresh.2) Web page classification techniques based on Naive Bayes were studied. A rough set weighted Bayesian classification method was improved, and the method was applied to public opinion analysis system.3) The existing text orientation analysis techniques based on machine learning and semantic information were discussed. An improved machine learning method based on semantic information was put up. And the method was applied to the public opinion analysis system.

节点文献中: