节点文献

产品描述词及情感词抽取模式的研究

Research on Extraction Patterns of Product Description Words and Sentiment Words

【作者】 赵文婧

【导师】 周延泉;

【作者基本信息】 北京邮电大学 , 信号与信息处理, 2010, 硕士

【摘要】 随着万维网的飞速发展,网络已成为完美的交流意见和发表观点的平台。越来越多的用户在论坛等平台发表对产品的评论。这些评论信息对于消费者和产品制造商都有较大的参考价值。然而,迅速增长的评论信息使得人工获取分析很难。因此,基于自然语言处理方法的产品评论分析有着重要的研究价值。在产品评论分析中,提取产品描述词和情感词是重要的处理过程。本论文针对中文评论信息,借助于计算语言学、统计学领域的理论和方法,从词性、句法分析两个不同语言粒度入手,挖掘出产品属性词和对应情感词间的模板,探索从中文评论语句中提取产品属性词及对应情感词的新算法。本文提出了对于不同领域的语料,给定领域相关种子词,基于模板从语料中互推迭代提取出产品属性词和对应情感词的提取算法。实验结果表明,该提取算法所需人工干预较少,性能优于现有方法,而且在此基础上实现了模板和提取算法的领域无关。本文的创新点是:1)利用产品属性词和对应情感词间的关系在两类词间进行互推迭代;2)本文设计的产品属性词和对应情感词提取算法领域无关且性能优于现有方法。提取算法不需任何领域相关的训练语料,只要给定极少的领域相关种子词就可以将词性模板、句法树路径和提取算法直接应用于其它领域。

【Abstract】 With the rapid development of the World Wide Web, network has become a perfect platform to express and exchange of views. Nowadays, more and more users express their reviews on the products in the forum and other platforms. These reviews are very useful for consumers and product manufacturers. However, a mass of reviews makes it a hard task to get a just view. Therefore, product reviews analysis based on natural language processing technology is of great value. Extraction of product features and sentiment words from reviews is important processes in product reviews analysis.With the aid of theories and methods in computational linguistics, statistics, from the view of POS and syntax tree, we extract templates between product features and the corresponding sentiment words, develop new technologies and methods for extracting product features and the corresponding sentiment words from Chinese product reviews.This paper proposed a new algorithm to extract product features and the corresponding sentiment words from different domain Chinese product reviews based on the templates and domain related seed words. The experimental results show that the extraction algorithm required less manual intervention and gain good and stable performance in different domains. The extraction algorithm and the extracted templates are domain-independent.This paper is a departure from previous work in that:1) it utilizes the relationship between product features and the corresponding sentiment words to extract the two kinds of words mutually and iteratively; 2) The extraction algorithm of product features and the corresponding sentiment words is domain-independent. The performance of extraction algorithm is better than any previous work in this research domain. Without the use of any domain related training corpus and given several domain related seed words, the extraction algorithms can be applied to many different domains.

节点文献中: