节点文献
观点提出者(opinion holder)提取研究
Research on Identifying Opinion Holder in Opinionated Sentences
【作者】 旷远;
【导师】 何华灿;
【作者基本信息】 北京邮电大学 , 模式识别与智能系统, 2011, 硕士
【摘要】 情感分析主要是针对主观性文本单元自动的获取有用的意见信息和相关知识。随着互联网和信息产业的快速发展,大量用户在论坛、博客等平台上发表自已的意见和观点,针对的内容几乎囊括所能想象的一切。在情感分析领域中,对于意见和观点的提出,需要提取出意见的提出者或发起者(opinion holder),以更全面的掌握人们对社会或公众问题的看法,从而制定更加正确的措施或发表更加正确的言论。因此,基于自然语言处理方法的opinion holder提取有着重要的研究价值。本文针对不同领域的语料,采用基于统计和基于规则的方法分别对opinion holder进行提取,最后将基于统计和基于规则的方法相结合进行提取。本文的研究成果主要有:首先,通过分析opinion holder的定义,提取和提出了相应的6个特征,分别为词、主观表达触发词、词性标记、命名实体、依存关系和句子结构特征,并对特征定义了特征观察窗口以尽量精确的包含特征的上下文。其次,通过进行句法分析,定义了两条基于主观表达触发词的用于提取opinion holder的句法规则,并根据所提出的句法规则设计了基于句法规则的opinion holder提取算法。最后,将基于条件随机场和基于句法规则的opinion holder提取进行结合,即将句法规则所得结果进行句法路径挖掘和置信度分析后选取相应特征作为条件随机场的训练特征。其结合进行提取的结果显示了较高的准确率和召回率,得到了较满意的结果。然而不足之处在于我们并没有进行指代消解,下一步将进行指代消解并运用语义消歧来进一步提高opinion holder识别的精确性。
【Abstract】 Sentiment analysis mainly aims to automatically obtain useful sentimental knowledge and relevant information from subjectivity texts. With the development of the Internet and Information Industry, many users can make their reviews in the forum, blog or other platforms, and what they have been talking is nearly all inclusive. In the field of sentiment analysis, among those views that had been put up it is important to identify the author or sponsor that is opinion holder in order to clearly know how people thought about the social or public problems meanwhile to device better measures and make the reviews properly. Therefore, identifying opinion holder based on natural language processing technology is of great value.In this paper, opinion holders from different fields are identified respectively based on statistical and rules, then we combined the results from statistical with rules to obtain the final identification result. The main results of this paper are:Firstly, by analyzing the definition of opinion holder, the relevant six features are extracted and proposed, including lexical, the opinionated_trigger words, POS tags, named entities, dependency and sentence structure, and feature observation windows are designed to contain the contextual information of features as precisely as possible. Secondly, by analyzing the layer of structure from parsing trees on a large scale, we propose two novel syntactic rules with opinionated_trigger words to directly identify opinion holder from the parse trees through the designed opinion holder extraction algorithm based on proposed two syntactic rules.Finally, a combination method of CRF with syntactic rules is proposed to identify opinion holder, where the syntactic rules are regarded as additional three features for CRF obtained through the feature extracting algorithm we designed. The combination identification results show a high precision and recall, and indicate satisfactory results. However, the anaphora resolution is not used in our study, so in the future the anaphora resolution combined with semantic disambiguation will be used to further improve the accuracy of opinion holder identification.
【Key words】 natural language processing; opinion holder identification; conditional Random Fields; syntactic rule; feature;