节点文献

细颗粒度情感倾向分析若干关键问题研究

Research on Several Key Technical Issues on Fine-Grained Sentiment Classification

【作者】 张奇

【导师】 吴立德;

【作者基本信息】 复旦大学 , 计算机应用技术, 2008, 博士

【摘要】 随着互联网和信息处理技术的发展,人们可以从新闻评论、论坛、博客等来源得到海量的评论信息,只有通过对信息的深入分析和提炼,信息才能更有效的为人所用。正是在这一背景下,文本的情感倾向研究成为当前一个具有广泛应用前景且十分新颖的研究领域。本文工作主要围绕着“细颗粒度”情感倾向分析中若干关键技术展开研究,包括:被评价对象抽取、评价关系抽取、情感倾向判定、知识库半自动构建以及半监督学习在情感倾向分析中的应用等方面。在文章和句子级倾向极性分析任务中,我们将条件最大熵算法和熵正则化框架结合,提出了半监督条件最大熵算法。该方法在句子级MPQA语料库中,可以达到78.2%的精度,比有监督方法有5.2%的相对提高。在被评价对象抽取方面,提出了基于条件随机场的被评价对象识别算法。该算法将被评价对象抽取问题转化为序列标注问题,通过上下文、词性、知识库等一系列特征完成被评价对象抽取。通过上述方法被评价对象识别精度可以达到91.17%。在评价关系方面,提出了一种将关系识别问题转化为序列标注问题的方法。利用条件随机场和一系列特征完成评价关系抽取。这一算法结合了语法层信息、词语层信息,并利用相邻关系的分类结果,因而具有更高的准确性。实验结果表明该方法的F值比最近邻方法有15%的提高。在模型自适应方面,提出了一种基于最大后验的条件随机场模型自适应算法。通过实验结果说明这种算法可以有效通过背景模型和适应语料,自适应到另外一个领域中,在被评价对象抽取实验中,经过适应的模型比未经适应的模型有34%的相对提高。此外,在知识库构建方面,我还提出了基于图互增理论的自举学习算法,利用弱监督分类器,从少量种子词和大量未标记语料中自动学习出符合要求的数据,再结合人工判断,半自动的构成所需知识库。最后,我们结合上述研究实现了面向汽车领域的情感倾向分析系统。

【Abstract】 People could get more and more information from BBS,BLOG,News reviews and so on,along with the improvement of information processing technology.However huge of unprocessed raw data could not bring enough useful information to us.Because of this,sentiment analysis,which is a novel topic with potential applications,has received great attention recently.In this dissertation,we will focus on the problems in fine-grained sentiment classification including:target identification,relation extraction,opinion word sentiment orientation decision,semi-supervised ontology construction,semisupervised classification and other key technologies in sentiment classification.In document and sentence sentiment classification task,we propose a semisupervised conditional maximum entropy.This algorithm combines entropy regularization framework with maximum entropy.In sentence-level,our approach achieve 78.2%accuracy in MPQA data set,the relative improvement given by semi-supervised technique is 5.2%over the supervised methodIn target identification,an algorithm based on conditional random fields is proposed.It extracts features from context,part-of-speech tags,ontology,and converts target identification into sequence labeling problems.The precision of target identification could achieve 91.17%with this algorithm.In relation extraction task,we propose a method which could be used to convert relation extraction task into sequence labeling problem.This algorithm uses conditional random fields to extract relations with syntactic information,POS tags and other features.Experimental results show that this algorithm achieves 15%relative improvements over the baseline method.In model adaption task,we present a novel technique for maximum a posteriori (MAP) adaptation of Conditional Random Fields Model.Through experimental results,we observe that this technique can effectively adapt a background model to a new domain with a small amount of domain specific labeled data.In target identification task,the relative performance improvement of the adapted model over the background model is 34A weakly supervised algorithm,graph mutual reinforcement based bootstrapping, is proposed to construct ontology.This algorithm extract lexicons with seed words and unlebeled corpus.Finally,a practical system in automotive domain is developed for movie review mining.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2009年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络