节点文献

微博热点事件的公众情感分析研究

Study on Public Sentiment Analysis of Events in Microblogs

【作者】 崔安颀

【导师】 马少平;

【作者基本信息】 清华大学 , 计算机科学与技术, 2013, 博士

【摘要】 微博客(以下简称:微博)是一种新兴的互联网社交媒体,用户以短文本形式实时交流个人见闻与观点,对社会事件表达情感倾向。从微博环境中发现热点事件并进行情感分析,正确评价网民的舆论,具有重要的现实意义。但针对微博的分析比较困难:微博文本长度短、内容多样性强,表达形式自由,语言较不规范。因此在微博中开展热点事件发现与情感分析的研究,具有较突出的研究意义。本文的主要工作包括:将微博话题标签行为作为发现微博热点事件的线索,通过标签分类完成微博热点事件的发现。本方法提出不稳定性程度、在线话题可能性程度与标签作者信息熵这三种度量,利用标签反映微博主题而又不依赖于文本内容的特点,识别微博中的突发热点事件、流行在线话题或广告营销内容。这一方法克服了传统突发检测方法完全基于数值变化、话题检测方法依赖文本语义的不足,在多语言微博环境中通过分类发现与真实社会事件相关的突发热点事件,去除流行在线话题或广告营销内容带来的噪声。实验表明本方法对微博话题的分类性能优于已有的标签分类算法。针对微博中表达情感的基本形式,提出基于情感记号的情感词典构造与情感分析方法。此类记号包括广义表情符号、重复标点现象、重复字母词等多种情感表达单元,适用于互联网非正式文本的环境。利用情感记号在微博文本中的同现关系,可通过迭代传播方式自动构造情感词典。与传统方法相比,本方法利用多主题微博中特有的情感记号,不限于单一语言或领域的传统情感词,实验结果显示其在多语言微博的情感分类任务中取得更优性能。面向事件类中文微博的特点,结合新词发现的方法构造适用于这一微博环境的情感词典,并完成事件的情感分析。该方法弥补了基于正式文本的分词工具应用于非正式文本时存在的不足,计算出网络新词、表情图标、错写词语以及名词实体的情感倾向,体现网络环境中旧词的新义以及网民对实体的评价,这是传统情感词典并未涵盖的。实验结果表明结合微博情感词典的情感分类结果有所提升。此外,采用不同种子词可构造不同情绪(如喜悦、愤怒、悲哀、恐惧、惊讶等)的情感词典,不限于传统的褒贬二类倾向,使微博文本的情感分析任务目标更加细致多样。

【Abstract】 Microblogging service is a new social medium on the Internet. Microblog usersshare their personal experiences and opinions in short texts, and express their attitudestowards social events. Therefore, it is important to discover popular and breaking eventsand analyze the public sentiments of them in microblogs. However, microblog analysis isdifficult: The texts of microblogs are short with diverse topics, and are expressed in freeforms which are usually informal. Thus, it is valuable to conduct academic research onpublic sentiment analysis of events in microblogs. The main contributions of this thesisinclude:Using hashtags as clues to discover breaking events in Microblogs. This methodintroduces three measurements, including Instability, Twitter Meme Possibility and Au-thorship Entropy on hashtags, which are closely relevant to the topic of the texts but notdependent on the words. By classifying the hashtags, the method recognizes breakingpopular events relevant to some real social events, removes the noises brought by onlinetopics and advertisements in multilingual microblog messages. It overcomes traditionalburst detection methods which ignore the text contents, as well as topic detection meth-ods which rely on the semantic information. Experimental results show that it achieves ahigher classification performance than other hashtag classification methods.Using emotion tokens as sentiment units to construct a sentiment lexicon forsentiment analysis. The emotion tokens include emotion symbols, repeating lettersand repeating punctuations, which frequently occur in informal Internet texts. Their co-occurrences can be utilized to automatically construct sentiment lexicons by label propa-gation algorithms. Comparing with traditional methods, the proposed method makes useof the emotion tokens typically in microblogs. It is not restricted to any single language ordomain, thus behaves better in multilingual microblog sentiment analysis as experimentalresults have indicated.Facing the characteristics of event-related Chinese microblogs, constructingChinese microblog sentiment lexicons with out-of-vocabulary (OOV) words discov-ery methods, which are used for sentiment analysis of events. The proposed methodreduces the errors brought by traditional word segmentation tools and semantic depen-dencies. It discovers the sentiment polarities of OOV words, animated emotional icons, misspelled words and named entities, which are formed by the public opinions from themicroblog users but are typically excluded in traditional sentiment lexicons. Experimen-tal results show that the performances are higher when considering the entries from theconstructed sentiment lexicon. Besides, this method can be applied to construct lexiconswith more dimensions (such as happiness, anger, sadness, fear and surprise) other thanthe only positive and negative sentiments.

  • 【网络出版投稿人】 清华大学
  • 【网络出版年期】2014年 06期
  • 【分类号】TP393.092;TP391.1
  • 【被引频次】2
  • 【下载频次】2542
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络