

Product Review Mining Based on Features Set

【作者】 靳亚辉

【导师】 黄威;

【作者基本信息】 华中科技大学 , 物流工程, 2011, 硕士

【摘要】 随着Web 2.0的兴起与普及,以及电子商务的快速发展,越来越多的消费者选择网络购物,并发表产品评论。这些产品评论成为了潜在消费者了解产品信息的一个重要的来源,并且在一定程度上影响着消费者的潜在消费行为。针对这些非结构化的、离散分布的产品评论,产品评论挖掘采用自然语言处理技术,以自动化的方式分析这些资源,帮助企业和个人方便、有效地获取这些信息。本文主要围绕基于属性的产品评论挖掘问题展开研究。在分析现有产品属性识别方法不足的基础上,提出建立产品属性集合的方法,从而更好的挖掘和汇总评论信息。首先,手工提取产品说明书和少量评论文本中的产品属性词语,利用产品属性集合的建立思想建立针对该产品类别的属性集合。并利用点互信息(PMI)的方法识别新评论文本中出现的新的产品属性词语,动态地扩展产品属性集合。其次,利用HowNet(知网)中的正、负面评价词组成种子情感词集合,并利用WordNet的同义词、反义词集合预测评论中观点词的情感倾向,对种子情感词集合进行扩展。然后,根据评论句中属性词语、情感词语和否定词语的数量,利用连接词以及就近原则计算产品属性的情感分值,并利用产品属性集合的层次结构将属性分值由最底层逐层向上汇总,获得产品各个层次上的意见分值。最后,本文以www.Amazon.com上Canon(佳能)品牌下Power shot SD780 IS相机的所有用户评论为样本,基于以上研究,获得基于该款相机的意见挖掘结果,并利用产品属性集合以及产品评价指标对结果进行局部和整体两方面的展示。

【Abstract】 Along with the fast development of E-commerce and the population of Web 2.0, more and more consumers go shopping online and post reviews of products .Those customer reviews are excellent sources for potential customers to gain more information of products, and may have some impacts on potential consumers’ behavior. To those unstructured and scattered opinions, product opinion mining is being developed to exploit these sources to help companies and individuals to gain such information effectively and easily, using NLP techniques automated.We study the problem of opinion mining at the feature-based level. After analyzing the limitation of the existing methods of product feature identification, we propose a method based on product features system to better mining and summarizing customer reviews. Firstly, we manually come up product features from user guide of product and a small amount of product reviews text, and establish a product features system of the product category, according to the theory of establishing product features system. Then, we identify new product features in the additional opinions by using pointwise Mutual Information (PMI), in order to improve product features system dynamically. Secondly, we utilize these positive and negative words in HowNet as the opinion seed list. And we expand the seed list by utilizing the adjective synonym set and antonym set in WordNet to predict the semantic orientations of adjectives. Thirdly, according to the number of feature words, opinion words and negative words in opinion sentence, we calculate product features’sentiment score by utilizing conjunctions and the principle of proximity. Then, we aggregate scores of each level features from the lowest layer to the upper layer by using the hierarchy of product features system. Finally, we extract product reviews of Canon Power shot SD780 IS on www.Amazon.com as a sample, analysis and obtain the opinion mining result of this camera based on previous theory research. And the overall result and partial result are showed by using the product features system and product indicators of evaluation respectively.

  • 【分类号】TP393.0;F274
  • 【下载频次】169

