节点文献
基于意见挖掘通用框架的情感极性强度模糊性研究
Research in Sentiment Strength Fuzziness with Opinion Mining General Framework
【作者】 寇广增;
【导师】 李纲;
【作者基本信息】 武汉大学 , 管理科学与工程, 2010, 博士
【摘要】 从认识论的角度来划分,信息分为客观性信息和主观性信息。客观性信息描述客观事实,主观性信息反映人或组织对于事物或事件的看法和态度。在过去,由于互联网普通用户只是阅读和接受信息,人们对信息的需求多表现为客观性信息,而如今随着大众参与网络信息的创造和发布,人们对主观性信息的需求更为普遍。虽然互联网包含着海量的主观性信息,但人们查找和利用这些信息的成本很高,迫切需要对主观性信息进行信息分析与处理。意见挖掘利用自然语言处理、信息抽取和数据挖掘等技术识别和分析主观性信息。它的出现满足了人们对于主观性信息分析的需求,已应用于网络口碑评价、舆情监控、企业竞争情报分析等众多领域。意见挖掘研究取得了丰富的研究成果,但也面临以下问题:①信息内容复杂,②信息形式多样化,③语言不规范,④表现出模糊性,⑤依赖于特定领域等。其中,关于情感极性强度模糊性的研究具有重要的学术意义和实践价值,它有助于全面真实地反映主观性信息。鉴于此,本文借助模糊数学和证据理论对情感极性强度模糊性进行处理。将情感极性强度级别看作模糊集,提出一种采用隶属度来表示情感极性强度的方法,判断情感极性属于某一强度级别的程度,运用模糊统计法来计算隶属度;将意见摘要合成看作决策问题,提出一种采用Dempster合成法则来合成意见摘要的方法。笔者以洗衣机为例,采用意见挖掘通用框架构建实验平台,为研究提供数据基础和对比参照。在实验平台上以算例的方式对模糊性方法处理的流程和效果进行了说明。本文共分为六章:(一)研究综述本章总结意见挖掘和情感极性强度的研究现状。意见挖掘研究分为两类技术路线:一类研究重点在于依据言论的情感极性来进行分类;一类研究重点在于抽取并分析有关评价特征的意见。在进行意见挖掘研究中发现,语料库标注者和语言学家之间对情感极性和情感极性强度很难达成一致,表现出模糊性。目前,对于情感极性的模糊性有一些研究成果,但关于情感极性强度模糊性的研究较少。(二)意见挖掘通用框架本章基于意见挖掘通用框架构建实验平台。笔者以20款洗衣机作为对象对框架的核心部分进行说明,主观性数据来源于京东商城,共计6006条。利用特征和情感词选择工具,提取出22个特征和229个情感词,进行情感分析的准确率达到88.47%,并根据网络口碑对洗衣机进行排序,同时将分析结果生成为意见摘要网页。(三)情感极性强度模糊性的问题分析本章旨在对情感极性强度模糊性问题进行分析。在意见挖掘研究中,情感极性表现出歧义性,而不是模糊性,它可以通过消除领域依赖的方法加以解决,而情感极性强度则表现出模糊性,难以消除。(四)情感极性强度模糊性的表示方法本章旨在提出情感极性强度模糊性的表示方法。在模糊数学思想的指导下,将情感极性强度级别看作为模糊集,采用隶属度表示情感极性强度属于强度级别的程度,运用模糊统计法计算隶属度。以算例的形式,计算得到第2章中程度副词和情感词集合的模糊性表示结果。(五)情感极性强度模糊性的合成方法本章旨在提出情感极性强度模糊性的合成方法。在证据理论思想的指导下,将意见摘要合成看作决策问题,运用Dempster合成法则进行合成。此章的算例基于第4章的程度副词和情感词的模糊性表示结果,以“松下XQB60-P620U6"洗衣机为例对合成方法进行说明。(六)总结本章对研究内容进行总结和展望。经过对比分析第5章模糊性处理后的结果与第2章未处理模糊性的结果,表明了情感极性强度模糊性处理的合理性和有效性。最后,对于下一步的研究工作进行了讨论。
【Abstract】 From the perspective of epistemology, information can be divided into objective information and subjective information. The former describes the facts and the latter reflects viewpoints and attitudes of the individuals or organizations.In the past, ordinary Internet users usually browse and receive information, as a result, what people demanded for information is mostly objective information. With the development of the Internet, more people contribute to the creation and distribution of information on the Internet, people’s needs for the subjective information becomes more popular now. The flourish of the Internet has generated massive volumes of subjective information. But the cost of finding and using the information is quite high, which causes urgent need to analyze subjective information.Opinion mining recognizes and analyzes subjective information by using techniques such as natural language processing, information extraction, data mining and so on. It emergence meets people’s needs for subjective information analysis and it has been applied to word-of-mouth analysis, public opinions analysis, enterprise competitive intelligence and etc.Opinion mining has achieved abundant research results, but it still faced with the following problems:1) complexity of the information,2) various forms of information,3) lack of standard languages,4) fuzziness,5) domain dependence, etc. The fourth problem has important academic significance and practical values. If solved reasonably it will contribute to a comprehensive real response of the subjective information.This paper studies sentiment strength fuzziness by using fuzzy mathematics and evidence theory. It considers sentiment strength as fuzzy sets, using membership degree to express sentiment strength and using fuzzy statistical method to calculate membership degree. The opinion summarization is regarded as decision-making issues in which Dempster-Shafer’s rule of combination is used to combine opinions.To attain these goals, the author built up an experimental platform based on a general framework for opinion mining, and took washing machines for example to provide reference basic data for research and comparison. Based on the experimental platform, the procedures and effects of the method are illustrated with examples.The dissertation is composed of six chapters as follows.(1) OverviewsThis chapter summarizes the status of the Opinion mining and sentiment strength research. The techniques of Opinion mining are divided into two categories. One type is focused on the classification, which is based on sentiment orientation of sentence; the other type tends to extract and analyze features of the viewpoints. During the research, we found it difficult to reach an agreement on sentiment orientation and sentiment strength between corpus annotators and linguists, which shows fuzziness. At present, the researches on sentiment orientation fuzziness have made some achievements, but the study on sentiment strength fuzziness is rare.(2) Opinion Mining General FrameworkThis chapter establishes an experimental platform by Opinion mining general framework. We illustrates core parts of framework using 20 washing machines as test objects, and a total of 6006 subjective records comes from JingdongShangCheng Website. Using feature and sentiment word selection tools, extract 22 features and 229 sentiment words. Then generate analysis results in web pages. The accuracy rate is 88.47%.(3) Analysis of Sentiment Strength FuzzinessThis chapter aims to analyze sentiment strength fuzziness. Both objects that people express their viewpoints towards and the languages as the means, for people to express their viewpoints exhibit fuzziness. So the analysis also tends to be fuzzy. In the opinion mining research, sentiment orientation shows ambiguity, rather than fuzziness, which can be resolved by eliminating domain-dependence, but sentiment strength cannot.(4) Representation of Sentiment Strength FuzzinessThis chapter aims to introduce a representation method of sentiment strength fuzziness. Under the guidance of fuzzy set theory, this paper considers the grades of sentiment strength as fuzzy sets, using membership degrees to express how sentiment strength belonging to a certain strength grade, and makes use of fuzzy statistical method to calculate membership degrees. Take adverbs and sentiment words in Chapter 2 as example to illustrate the method.(5) Combination of Sentiment Strength FuzzinessThis chapter aims to introduce combination of sentiment strength fuzziness. Under the guidance of evidence theory, the opinion summarization is regarded as decision-making issues, and Dempster-Shafer’s rule of combination is used to combine opinions. Based on the results in Chapter 4, an example illustrates the combination method by taking the "Panasonic XQB60-P620U6" washing machine as test object. (6) ConclusionThis chapter summarizes the research contents and prospects. Based on comparative analysis of the new results in Chapter 5 and old results in Chapter 2, this paper prove that the new method reasonable and valid. Finally, this paper discusses the further research.
【Key words】 Opinion Mining; Sentiment Analysis; Fuzziness; Sentiment Strength;