节点文献

文本情感分类的研究

Research of Text Sentiment Classification

【作者】 张彦博

【导师】 瞿有利;

【作者基本信息】 北京交通大学 , 计算机科学与技术, 2010, 硕士

【摘要】 文本情感分类是指通过挖掘和分析文本中的立场、观点、情绪等主观信息,对文本的情感倾向做出类别判断。随着人们在web中表达自己观点越来越普遍,针对文本情感分类的研究也变得越来越重要。本文提出了一种文本情感分类算法,分为主观性分类和极性分类两个部分。主观性分类部分分为训练和分类两个过程,训练过程接受已标记的训练文本集,经过文本预处理、文本表示和特征选择得到语句特征表示;利用主观性分类模型训练算法对这些语句特征表示进行处理,得到文本主观性分类模型。分类过程接受语句集,经过文本预处理、文本表示和特征选择以后得到各输入语句的特征表示,接下来利用文本主观性分类算法结合分类模型进行主客观初分类,最后利用动态规划对分类结果进行修正,得到主观性文本子集。极性分类的训练过程接受源领域标记文本集合和目标领域未标记文本集合,经过文本预处理、文本表示、特征选择和基于支点SCL的特征选择得到各文本的训练语句特征表示,利用极性分类模型训练算法对语句特征表示进行处理,得到文本极性分类模型。分类过程接受文本主观句集,经过文本预处理、文本表示、特征选择和基于支点SCL的特征选择得到各输入语句的特征表示,文本极性分类算法利用这些特征表示和极性分类模型得出肯定句子集和否定句子集。实验表明:主观性初步分类准确率为94.7%;基于动态规划修正的贝叶斯分类器的准确率为95.8%;基于支点特征选择的SCL算法的极性分类逻辑平均误分率为0.16,低于普通的SCL算法。

【Abstract】 A text is automatically classified as positive or negative sentiment through text sentiment classification, i.e. mining and analyzing subjective information in the text, such as standpoint, view, mood, and so on. As more and more people express their viewpoints on web, text sentiment classification becomes more and more important.This paper presents and implements a text sentiment classification algorithm, which contains two steps, subjectivity classification and polarity classification.Subjectivity classification contains two procedures:training and classification. In the training procedure, feature presentations for sentences are obtained from labeled training text sets via text preprocessing, text presentation and feature selection; then, text subjectivity classification model is obtained via subjectivity classification model training algorithm. In the classification procedure, feature presentations for sentences to be classified are obtained via text preprocessing, text present and feature selection; then, text subjectivity classification algorithm together with classification model is used to classify the sentences as an objective text subset and a subjective text subset; at last, the results are corrected by dynamic programming.In the training procedure of polarity classification, a source domain labeled text set and a target domain unlabeled text set are combined as a training set, feature presentations for sentences of the training set are obtained via text preprocessing, text presentation and SCL feature selection based on pivot features; then a text polarity classification model is obtained via polarity classification model training algorithm. In the classification procedure, feature presentations for sentences to be classified are obtained from subjective text via text preprocess text present and SCL feature selection based on pivot features; then text polarity classification algorithm together with classification model is used to classify the sentences as a positive sentence subset and a negative sentence subset.Experiments indicate that the precision of the preliminary subjectivity classification is 94.7%; the precision of the Bayes classifier based on the dynamic programming correction is 95.8%; the LAMP (Logistic Average Misclassification Percentage) of the SCL algorithm based on pivot feature is 0.16, which is lower than normal SCL algorithm.

  • 【分类号】TP391.1
  • 【被引频次】3
  • 【下载频次】387
节点文献中: 

本文链接的文献网络图示:

本文的引文网络