

【作者】 陈旭

【导师】 高辉;

【作者基本信息】 电子科技大学 , 计算机软件与理论, 2010, 硕士

【摘要】 舆论是指在一定的社会空间内,随着某一事件现象的发生、发展、变化,民众对该事件的态度。舆情是舆论的放大体,指民众对社会现象的看法、想法态度及意见的总和,也是民众的社会政治态度对执政者决策行为有影响和指导作用的那一部分。网络的普及使得社会舆论的发生机制有了很大的转变,胡主席指出“互联网已成为思想文化信息的集散地和社会舆论的放大器”,网路的开放性、自由性使得各阶层的人们都能便捷的使用,网络的虚拟性和隐蔽性让人们更愿意在网络上表达自己的想法看法和态度立场,社会舆论在网络中传播变化发展并最终有可能形成了网络舆情甚至有可能会影响政治体制管理,网络舆情已经成为社会舆情的最主要的构成之一,它是社会舆情的反映,然而基于网络信息的形式多样,各阶层民众的想法看法态度也不尽相同,以及涉及到的信息量极其庞大,传统的收集分析机制很难有效的完成舆情识别工作,因此必须构建一个高效舆情收集分析报告系统来完成这样的工作。本文将理论研究与实证研究相结合,在文献阅读研究的基础上,利用社会网络分析技术,对信息数据进行挖掘分析,主要研究内容包括四个方面:(1)本文首先引入基于多网关出口的分布式主题舆情爬虫,详细的介绍了该爬虫系统的构建方式、模块功能、实现方法以及页面双层节点结构的创新发明。该爬虫具有较高的时间效率和应用效率,很好的解决了数据来源的问题;(2)在研究了多个经典的层次算法基础上提出了一种新的层次聚类算法,通过该算法可以较好地解决人工设定阈值所引起的算法不稳定性,并且借助Hadoop平台,提高了算法的执行效率,为舆情发现打下了基础;(3)本文提出了三层社会舆情网络的建立方法,该方法将主题类别事件以网络的形式组织起来,并且网络中包含子网络,将主题涉及到的站点,用户一并进行了关联,使得分析对象不再是独立的节点,该三层网络是舆情发现的核心;(4)本文利用社会网络技术对三层社会舆情网络的舆情挖掘进行了探讨和尝试,提出了膨胀系数的概念,成功的实现了舆情发现以及关键节点的发现。

【Abstract】 Public view is defined as the people’s attitude to some event in a certain social community. Public opinion is the amplifier of public view. It’s the sum of people’s views of various social phenomena, ideas, attitudes and opinions, and it also refers to the part of people’s social-political attitude that is influential and directional to the behavior of policy maker. The popularity of the network causes the mechanism of public opinion to undergo a drastic change.“The Internet has become the center of ideology, culture and the amplifier of public opinion”, said Chairman Hu. The open and free nature of Internet makes itself convenient to everybody, and the privy and virtual nature of Internet makes people more willing to express their ideas and opinions online. Public opinion spreads, changes, and develops in the network, which may gradually form internet public sentiment and affect the political system ultimately. Internet public sentiment, also known as the reflection of social public opinion, has already become the main part of social public opinion. However, because of the various forms of web information, the different ideas of people of different estates, and the extreme scale of web information, traditional collecting and analyzing mechanisms could not work effectively. As a result, it is necessary to build a new efficient system.In this paper we combine theory with empirical study. Based on the previous work in this field, we adopt social network analysis techniques to analyze WEB information.The four main aspects of our research are as follows:(1) Firstly, we introduce the distributed subject public opinion Crawler system which based on multi-Gateway export, construction of the system, function of modules and the invention of double-node structure are put forward in details. The system is effective and efficient, solving the problem of data source very well.(2) Based on the research of a number of classic hierarchical clustering algorithms, a new hierarchical clustering algorithm has been proposed, through which the instability caused by manual threshold setting is well solved. Then Hadoop platform is used to help the algorithm achieve better performance. which lays a solid foundation for the public opinion finding. (3) Then we proposed a method of building a three-tires social public-opinion network. This method organized the subject category event in form of network, in such network contains many sub-networks, these network connect the subject involving sites and users, this makes the analysis of the object is no longer an independent node. The three-tires network is the core of public opinion finding.(4) Finally, we used social network technique trying to mining the public opinion from the three-tires public-opinion network, and proposed the concept of expansion coefficient. We successfully achieve the goal, realizeing the finding of public opinion and the key node.


