【作者】 周而重

【导师】 钟宁;

【作者基本信息】 北京工业大学 , 计算机应用技术, 2013, 博士

【摘要】 由于社交软件的低门槛使用策略,各大社交网站已成为社会不同群体汇聚内部声音的重要场所。网络舆论是广大网民针对网络话题所持有的代表性意见,随着网络舆论对现实社会影响力的不断提升,为维护社会的稳定,发现社交网站上的热点话题并对其舆论演化趋势进行预测具有重大的现实意义。课题研究工作的两个出发点是:如何从分散的、异构的网络数据中实时发现网民们关注的热点话题;如何分析热点话题的舆情信息来预测网络舆论的演化趋势。目前流行的网络热点话题挖掘方法多借助文本挖掘技术来理解信息内容和发现信息间的关联,从而挖掘出虚拟网络中的热点话题。由于社交网站上的网络数据反映了用户的行为模式和思想,并且用户在不同的角色、动机、时间和氛围下产生的数据会具有不同的含义,因此网络数据的分析策略需要考虑用户自身的特点及其他外界因素对用户的影响。智慧万维物联网(Wisdom Web of Things,简称W2T)中提出的方法学为解决现实世界和虚拟世界的交叉问题提供了指导思想,该方法学强调人的作用,并从信息网和信息粒度的角度去组织和管理数据,以便正确理解用户的需求从而为用户提供智能化的信息服务。由于热点话题的出现和网络舆论的形成是群体互动的结果,因此研究工作以智慧万维物联网方法学为理论指导,并结合计算机科学、社会学和新闻学的相关理论来分析博客热点话题的形成和发展过程以及网络舆论的形成机制和演化特点,在确定重要影响因素后,通过评估相关因素的作用来识别博客热点话题并预测其舆论的演化趋势。在相关研究基础上所进行的创新性工作如下:1)提出一种基于用户视角的话题模型构建方法。针对时效性网络话题,传统话题检测方法多采用文本聚类技术,并将聚类后得到的文档集合作为话题,因此提取出的话题模型不能直观地反映出话题的组成和内部焦点的变化,对此基于用户视角的话题模型构建方法重点考虑信息的语义表达粒度和时效性网络话题的特点,并以引起话题的事件为中心,通过分析话题焦点的变迁和事件的演化历程来提取出层次化的话题模型。2)提出一种在线社区上的意见领袖识别方法。虚拟网络不同于现实社会,网络意见领袖的形成过程和行为特点决定了其特殊的评估机制。设置议题和舆论引导是意见领袖的主要特征。为评估用户在话题传播方面的影响力,用户在社交网中的博文发表程度和网络位置被重点考核。为评估用户的舆论引导作用,用户发表博文的水平和其对社交网中周围用户的意见影响被重点考核。3)在基于用户视角的话题模型构建方法基础上,提出了一种博客热点话题检测方法。时效性网络热点话题的生命历程是一个剧变的过程,但传统网络话题热度评估机制多通过直观地统计话题在一段时间内获得的用户关注度、参与度及话题的时效性来评估话题的热度,对此时效性博客话题在不同时期的用户评论数、博文发表人数、意见数和意见领袖人数的变化被用来评估话题当前的成长程度。话题的持续时间、成长程度、用户参与度和话题的新颖性被用来识别当前的热点话题。4)提出一种基于突发性词汇的热点话题检测方法。通过热点词汇聚类来确定热点话题的方法已经得到了广泛推广,但不同词汇指代话题的作用不同。为了更加清晰地展示话题内容,基于突发性词汇的热点话题检测方法重点考虑词汇间关联的突发性。该方法首先根据词汇间的关联构建不同时期的词汇网,从而提取不同时期的话题,最后根据话题的突发性程度来确定热点话题。在识别热点话题方面,话题的突发性程度主要评估用户们的博文发表行为和意见交互行为。5)提出一种基于意见领袖引导作用的网络舆论演化模型。传统意见交互模型的构建环境是封闭的社交网络,为了构建动态的虚拟网络环境下的舆论演化模型,网络意见领袖的状态成为意见演化模型重点模拟的对象,其中意见领袖的个人情感、网络氛围和意见交流者的特点被用来预测意见领袖在下一时刻的状态。

【Abstract】 Social websites have been important platforms where users from the samedomain express their feelings, owing to the simple operation. The online consensus isa representative opinion that the majority of users hold for an online topic. The onlineconsensus has a great impact on the physical society, and it is useful to detect hottopics and predict the trend of the consensus in order to keep the society harmonious.The two research goals are how to detect hot topics that users pay more attentionto from the dispersive and heterogeneous Web data and how to predict the trend of theonline consensus by analyzing online opinions. The hot topic detection methods oftenadopt the text mining technique and aim at detecting hot topics by understanding thecontent of the text and identifying the correlation between information. The data fromsocial websites represent the pattern of user behavior and user ideas. Furthermore, thedata can represent different meanings when such factors as user role, motivation, timeand context change. Hence, the data analysis strategy needs to take the characteristicsof users and the influence of the environment into consideration. The Wisdom Web ofThings (W2T) methodology is proposed to solve the intersectional problem betweenthe offline world and the online world. In order to correctly understand the user needto provide the right service, the W2T methodology emphasizes the factors related tohumans, organizes the data in the form of the network, and manages the dataaccording to the information granularity. Both hot topics and online consensus resultfrom the interaction among users in a community, and the research is based on theW2T methodology and draws on theories from the computer science, sociology, andjournalism to analyze the formation and development of blog hot topics and onlineconsensus. The important factors that determine or influence the topics and onlineconsensus are identified to detect hot topics and predict the evolutionary trend of theconsensus. The main work can be described as follows:1) A method of constructing the topic model based on user views is proposed. Asfor temporal online topics, the traditional topic detection methods often adopt the textclustering technique and consider each cluster as a topic. Hence, the topic modelbased on those methods can’t intuitively reflect the structure of a topic and thechanges of issues. In order to construct a hierarchical topic model, the proposedmethod emphasizes the information granularity of the semantic expression and the characteristics of temporal online topics, and focuses on the event to analyze thecomponents and evolution of the topic.2) A method of identifying the opinion leader in an online community is proposed.The virtual network is different from the physical society, and the evaluation measuretakes the formation mechanism of opinion leaders and behavioral characteristics intoaccount. On one hand, the number of relevant posts and the position in the socialnetwork are measured to assess the user influence on the topical spread. On the otherhand, the quality of a post and the influence on neighbors’ opinion making aremeasured to assess the user influence on the direction of online consensus.3) As for the method of constructing the topic model based on user views, amethod of blog hot topic detection is proposed. The life span of a temporal online hottopic often shows the drastic changes. However, the traditional methods oftenevaluate the topic hotness by counting the degree of the user participation, userattention, and topical novelty during a period of time. Hence, the number of replies,post publishers, opinions and opinion leaders within different time intervals are usedto evaluate the topical growth. Blog hot topics are consequently identified by countingthe duration, the degree of the topical growth, user participation, and topical novelty.4) An approach to hot topic detection based on bursty words is proposed.Although hot topics can be represented by hot words, different words have differenteffects on representing a topic. In order to clearly reflect a topic, the approach focuseson the burst of the correlation between words. The word networks within differenttime intervals are constructed according to the co-occurrence between words and thentopics are extracted from the related word network. Hot topics are identified byevaluating the burst of each topic. As far as the burst feature of a topic is concerned,the user behavior such as the post publication and the reply to a post is counted.5) An evolution model of online consensus based on the opinion leader’s guidingrole is proposed. The traditional opinion evolution models are constructed in a closedsocial network. In order to simulate the evolution of online consensus in a dynamicnetwork, the model attaches more importance to opinion leaders, and the sentiment ofan opinion leader, the context of the network and the characteristics of opinioncommunicators are assessed to predict the status of the leader in the next time interval.

