节点文献

文本情感分类相关问题研究

Research on Text Sentiment Classification

【作者】 薛璐影

【导师】 吴翔虎;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2010, 硕士

【摘要】 随着互联网的迅猛发展以及互联网用户数量的急剧增加,随之涌现出大量文本形式的信息。人们通过互联网发布含有主观倾向的信息,表达对商品、时事等问题的观点、态度以及褒贬等。文本情感倾向性分析涉及到计算语言学、人工智能、机器学习、信息检索以及数据挖掘等多方面的研究,具有广泛的应用价值,目前,情感倾向性分析已经成为国内外研究的热点。情感倾向性分析,就是对说话人的态度(或称观点、情感)进行分析,也就是对文本中的主观性信息进行分析。情感倾向分析的研究大致可以分成四个级别:词语情感倾向性分析、短语情感倾向性分析、句子情感倾向性分析、篇章情感倾向性分析。词语情感倾向性分析是对含有情感倾向的名词、动词、形容词等进行分析,识别并判断其情感倾向以及情感强度,词语情感倾向性分析是文本情感倾向性分析的前提和基础。句子情感倾向性分析的对象是特定上下文中的语句,包括主客观句子的识别,主观句子情感倾向的判断,以及与句子情感倾向相关的要素的提取。篇章的情感倾向性分析,就是从文档整体上判断某文本的情感倾向性。本文针对情感分类中的特征提取问题,分别基于篇章和句子级别,对比了情感搭配、情感词、程度副词、否定副词、以及词序列等作为特征对情感分类的影响。实验结果显示,情感词、程度副词、否定副词是文本情感分类的主要特征,添加情感搭配和词序列后,文本情感分类效果有明显的提高。本文针对情感分类中的特征选择问题,分别基于篇章和句子级别,采用文档频率、卡方统计量、文档频率与卡方统计量相结合、信息增益、以及信息增益与遗传算法相结合的特征选择方法,比较了各类特征选择方法对文本情感分类效果的影响。实验结果显示,基于信息增益与遗传算法相结合的特征选择方法对文本情感分类的效果具有明显的改进。

【Abstract】 With the development of Internet and the increase of internet users, more and more information has been organized as the textural format. People release subjective information through internet which contains their opinion, attitude and judgment about merchandises and current affairs.Text sentiment polarity analysis is related to Computational Linguistics, Artificial Intelligence, Machine Learning, Information Retrieval, and Data Mining, etc. It has broad applications; as a result, text sentiment analysis has become a hotspot in natural language processing in recent years.Text sentiment polarity analysis is to analyze the writers’attitude (or point of view, emotion), that is, analyzing the subjective information of text. Sentiment analysis in general terms can be divided into four levels: word sentiment analysis, phrase sentiment analysis, sentence sentiment analysis, and text sentiment analysis. Word sentiment analysis deals with words which contain subjective information, for example: noun, verb, adjective, adverb, etc. It’s the precondition and foundation of text sentiment analysis. Sentence sentiment analysis deals with sentences which contain context information. Text sentiment analysis analyzes text’s subjective information as a whole.For the problem of feature extraction in sentiment classification, this paper extracts different kinds of features, including sentiment patterns, sentiment words, degree adverbs, negative adverbs, and word sequences, to analyze their influences toward sentiment classification in text and sentence levels. The experimental results show that sentiment words, degree adverbs and negative adverbs are most helpful, and the performance is obviously improved with the help of sentiment patterns and word sequences.For the feature selection of sentiment analysis, this paper uses different methods of feature selection, including DF, CHI, the combination of DF and CHI, IG, the combination of IG and GA, to analyze their influences toward sentiment classification in text and sentence levels. The experimental results show that the method of the combination of IG and GA performs best.

节点文献中: