

Research of Product Opinion Mining Algorithm for Web Texts

【作者】 肖芬

【导师】 徐蔚然;

【作者基本信息】 北京邮电大学 , 模式识别与智能系统, 2010, 硕士

【摘要】 随着互联网的广泛应用,在Blog.BBS.Wiki等Web站点中出现了大量的针对商品或服务的客户评论。本文针对这样的Web评论文本,主要研究从文本中提取产品属性词和评价情感词,然后对客户持有的意见进行极性判断。文中使用到的方法经过实验都证明了方法的适用性,相对应所开发出来的系统也具有很好的鲁棒性。本文的研究内容主要如下:1、针对网络资源,首先用基于HTML标签的模式匹配的信息抽取方式从特定的网页中抽取产品属性词建立基本的评价对象词典,然后利用搜索引擎采集评论文本从中抽取情感词,然后基于HowNet计算这些词的倾向性,建立具有口语化特征的情感词表。2、利用中文依存句法分析,结合其他的语义特征进行属性词的抽取,以扩大属性词典,然后使用二部图模型,对属性词和情感词进行反复的互训练,最后将新训练的属性词和情感词分别写入词典,且将匹配的属性词和情感词以二元组的方式写入文本。3、手工构造了否定词、转折词和程度词表,然后定义了评论情感词的评分模型,对抽取出来的评价情感词进行打分,最后确定其极性,即客户对产品属性所持有的意见或者态度。通过上述工作,本文实现了对Web文本的意见挖掘,即属性词和情感词的抽取及意见的褒贬分析,并建立了相关资源。本文最后探索如何实现跨领域,在一定程度上表明了方法的可行性。

【Abstract】 With the wide range of the Internet applications, Blog、BBS、Wiki and other Web sites appear in a large number of customer reviews for products or services. The paper aims at these Web texts, research of how to extracting product features and opinion words from texts, and then holds for a client to determine polarity of opinions. The methods in the paper have been proved the applicability by experiments;the relative developed system also has a good robustness. Our study is mainly as follows:1、Using network resources, we first adopt pattern matching extraction methods based on HTML tags to extract product features from specific WebPages and then establish a basic feature dictionary. Secondly, we crawls comment texts from search engines to extract opinion words, and then calculate the polarity of the words based on HowNet to construct a characteristic of colloquial sentiment lexicon.2、Use of Chinese Dependency Parsing Analysis, combination with other semantic properties, we extract new product features and expand the feature dictionary, and then based on the bipartite graph model, we take the feature words and opinion words to repeated co-training, finally, we write news feature words and opinion words into respect lexicon. At the same time, we write the matching feature and opinion words into new text in the way of binary group.3、We artificially construct negative word table, turning the table and extent of vocabulary words, and then define a rating model of sentiment words, scoring the sentiment word, then judge the polarity of the word, that is, the opinion or attitude of the reviewer.Through the above work, this paper presents the views of Web text mining, namely, extracted of the feature words and opinion words and the analysis of praise and abuse. And we established related resources. The paper finally explores how to achieve cross-domain; to a certain extent, we have been proved the feasibility of our methods.
