节点文献

基于组块分析的中文短语情感倾向研究

Research on Chinese Phrase Sentiment Analysis Based on Chunking

【作者】 孙慧

【导师】 关毅;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2010, 硕士

【摘要】 随着Internet的迅速发展,特别是论坛和blog等大量的主观性媒体的出现,打破了信息发布者与接收者之间森严的界限,这使文本正在成为最重要的交互方式之一,其中包含的观点信息越来越引起公司和政府的注意。但是这种改变也使网络上的文本信息数量呈爆炸式增长,文本情感倾向性分析作为自动获取其中观点信息的一种手段,成为自然语言处理的一个热点问题。文本情感倾向性分析,就是对说话人的态度(或称观点、情感)进行分析,也就是对文本中的主观性信息进行分析。词汇情感倾向性分析作为文本倾向性分析的基础,有着举足轻重的作用。短语作为词汇和句子之间过渡的桥梁,可以增大情感分析粒度,对提高句子乃至篇章情感倾向性分析系统性能有重要意义。本文针对基于词典的词汇情感倾向性分析方法中对情感词倾向绝对化标注问题,提出了一种获取上下文相关的词汇情感倾向方法。同时针对目前缺少包含上下文相关情感词标注资源的问题,使用最大熵交叉验证和手工校正结合的方法加以构造,并在此基础上构造了上下文相关的特征集合用来预测情感词在上下文中的情感倾向。实验表明,此种方法与基于词典的词语情感倾向性分析方法相比,F值提高了4.9%。针对二词短语情感倾向分析问题,使用了基于规则的分析方法。在此方法中构造了特征模板,使用互信息对组块情感倾向进行计算。并说明了程度副词和否定副词对于组块情感倾向的影响以及收集方法。针对更加普遍的组块情感倾向分析问题,使用了情感分类方法进行分析,本文以短语包含的词的情感倾向以及短语类型等为特征,分别应用了最大熵模型和支持向量机模型对组块情感倾向进行分类,并将结果与传统的基于累加的方法进行比较,最后支持向量机模型取得最好的效果。最后,分别使用词汇和短语对句子的情感倾向进行分析,结果表明使用短语增大了情感分析的粒度,对于句子的情感倾向性分析性能有很大提高。本文使用上述方法,将短语情感倾向性分析分为两个层次进行了研究,分别是词汇情感倾向消歧以及短语情感倾向性分析,句子级别情感倾向性分析结果表明,本文中系统对于文本情感倾向性分析有积极作用。

【Abstract】 With the rapid development of Internet, particularly the popularity of subjective media, such as forums and blog, etc, the strict boundaries between information distributors and receivers has been broken. Text is becoming one of the most important interaction ways, the subjective information contained in it has drawn increasing attention from companies and governments. And this change has made textural information explosive growth. Text sentiment analysis is a method to obtain subjective information automatically, which has become a hotpot in natural language processing.Text sentiment polarity analysis means analyzing the speakers and writers’attitude (or point of view, emotion), that is, analyzing the subjectivity information of text. Word sentiment analysis is the foundation of text sentiment analysis, it plays a important role. As a bridge between words and sentences, phrase can increase granularity of text sentiment analysis. Therefore phrase sentiment analysis has profound significance.In current research, one disadvantage of methods based on lexicons is that it is to tag words priori sentiment polarity: out of context. This paper presents a method to obtain the contextual sentiment polarity of words. For the lack of contextual corpus for sentiment analysis, we combine maximum entropy based cross-validation and manual annotation to construct the corpus. Then a valid set of contextual features is extracted to predict the word contextual sentiment polarity in the context. Compared with the methods based on lexicons, experiments show that F score is improved by 4.9%.For the two-word phrases sentiment analysis, this paper used rule-based method. We constructed templates, and used PMI (Pointwise Mutual Information) to obtain the sentiment polarity of phrases. Besides, this paper describes the function and the collection methods of adverbs of degree and negative words in phrase sentiment analysis.For more common phrase sentiment analysis, this paper used classification method to solve it. This paper constructed feature set including word sentiment polarity and chunk type, etc, and used maximum entropy model and support vector machine as the classification algorithms. Compared to the method based on the summation of polarity, the support vector machine obtained the best result.At last, we obtain sentence sentiment polarity by words and phrases respectively. The result shows that phrases increase granularity of sentiment polarity analysis and improve the performance of sentence sentiment polarity analysis.This paper obtained phrase sentiment polarity based on above techniques. The research was divided into two levels, namely contextual sentiment polarity disambiguation of Chinese words and phrase sentiment analysis. Experiments on sentence sentiment analysis show that the approach in this paper achieves a good result.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络