节点文献

领域无关的产品评论分析研究

Towards Domain Independent Product Review Analysis

【作者】 岑松祥

【导师】 钟义信;

【作者基本信息】 北京邮电大学 , 信号与信息处理, 2009, 硕士

【摘要】 网络中大量的产品评论信息不仅能给有购买意向的消费者提供有用的参考信息,而且这些直接来自用户的评论也能够为产品生产者提供宝贵的反馈信息,以便于改进产品的质量。然而,人们不可能逐条地有效阅读海量的用户评论。用户评论分析系统借助自然语言处理技术,能够帮助人们更好地分析评论并了解相关产品。产品评论分析系统一般包括两个部分:产品相关的实体抽取和倾向判断。虽然评论分析系统受到越来越广泛的关注,但是跨领域的应用问题总是不能很好地解决。大部分的有监督算法会因为另一领域训练语料的缺乏而受到限制。本论文对跨领域的产品评论分析问题做了初步研究。对应于第一部分,论文提出了一种半监督抽取算法:给定一个领域的评论语料和少数的种子词,算法能抽取出语料中的产品描述词。对应于第二部分,利用到搜索引擎检索结果在一定程度上能够反映词语相关程度的现象,论文提出了一种无监督的情感分类算法。实验表明,论文中提出的算法能够比较有效地在解决跨领域问题。

【Abstract】 The large amount of product reviews published by the users are quite useful for those who wants to buy the product, and also valuable for the manufactures to improve the product quality. However, tremendous amount of reviews available on Internet make useful information inaccessible. With the help of a system that capable performing review analysis based on Natural Language Processing technology, users can get a better understanding about the product.The problem of product review analysis generally consists of two partments: products’ entity extraction and opinion judgement, each of which has been addressed by many researchers. However, when applied to a new product domain, most product analysis systems suffer from high cost of building the training samples for both two sub tasks. This thesis addressed the problem of developing a domain independent review analysis system. For the first sub-task, a semi-supervised extraction algorithm was proposed. Given the review documents of a specific domain, the algorithm can extract the product description words with only a few predefined seed words. For the second, I proposed a novel unsupervised sentimental classifier, which is based on search engine’s potential to indicate how the words are related to one and another. The experiment showed that the proposed method is effective and domain independent.

节点文献中: