

Study of Knowledge Discovery of Opinions from Web Reviews

【作者】 陈晓美

【导师】 毕强;

【作者基本信息】 吉林大学 , 情报学, 2014, 博士

【摘要】 当今的中国,客观存在两个社会舆论场,一个是以报纸、广播电视等为主流媒体的社会舆论场,一个是以互联网和近几年来兴起的Web2.0应用为平台的民间舆论场。在新的Web2.0环境下,基于互联网的社会舆论平台除了原有的网站新闻评论、BBS等形式外,又涌现出了聚合新闻(RSS)、维基百科(Wiki)、QQ等即时通信工具(IM)、(微)博客、播客、淘宝与易趣综合的商务平台等新形式,使得网络当中的评论信息量得到了快速增长。目前我国网民规模已经进入发展平台期,手机成为新增网民的第一主力,微博、社区等微内容成为网络评论观点的主要来源,及时性、开放性、交互性、思想性、草根性成为网络评论信息的新特征,深深影响着人们生活的各个领域,改变了社会舆论生成演变与聚合的机制,拓展了社会舆论的传播空间。在Web2.0环境下,人们普遍感到,获得观点已经与获取信息同等重要,但要想从中获得体现价值的观点信息却变得越来越困难。究其原因在于:一是由于发表评论的人角度或目的不同,评论观点经常是正面和负面意见相混合,从中准确获取评论信息将花费很多时间和精力;二是由于以Web2.0应用为平台的民间舆论场的信息源受到较大的污染,网络评论中的这些主观信息五花八门、纷繁芜杂,良莠不齐,而以往采用的传统网络社会舆论分析技术手段(主要对象是网页和论坛)对动态性更强、结构更复杂的Web2.0网络应用处理能力有限,无法获取这些深层社会舆论信息要素,也无法甄对信息真伪,影响了网络评论信息分析效果。鉴于此,开展对于Web2.0的网络评论信息的分析研究,有助于我们更好地发掘蕴含在网络评论背后的观点信息,为决策和对未来的预测提供更加深层和丰富的信息支持,同时在理论上丰富网络评论信息分析的理论体系。本论文以Web2.0应用为平台的民间舆论场的信息源为逻辑起点,综合运用文本挖掘、观点挖掘、知识发现、LDA主题模型、本体学习等理论和方法,从主题聚类视角对网络评论信息分析模式、观点挖掘的理论、技术、方法及其应用等问题进行了较深入系统的研究。论文所做的主要研究工作如下:(1)对选题相关的国内外研究现状、热点与前沿、应用进展进行了较全面系统的分析与综述。梳理、分析了网络评论观点知识发现的相关理论与方法,为本研究工作的展开奠定了坚实的理论与方法基础。(2)以显式观点的特征-情感关联关系发现方式作为非结构化评论文本的观点挖掘基础,利用网站提供的半结构化的显式观点提取评论对象的特征、情感极性和二者搭配关系,构建观点知识库,在一定程度上解决情感词语境敏感问题,将观点知识库作为非结构化评论文本的观点挖掘基础,辅助完整的挖掘工作。(3)提出基于LDA(Latent DirichletAllocation,潜在狄利克雷分配)主题聚类的网络评论知识发现的主要任务和解决方法,包括相似评论文本聚类、评论主要观点抽取、深度观点判定等方法。(4)从认知视角,分析探讨了面向隐性认知的网络评论知识发现规律,在此基础上以领域知识为核心,将基于观点词的一般挖掘与基于主题的深度挖掘相融合,构建了多库融合的网络评论观点知识发现模式。(5)以教育领域网络评论观点挖掘为例进行了实证研究,为其应用研究提供了有价值的参考。论文取得的创新研究成果包括以下三个方面:(1)构建了基于本体的观点知识库,并提出了基于观点本体知识库的观点挖掘模式,有助于解决隐式观点识别和语境敏感问题,并可辅助提高领域词典的动态扩展性。(2)基于主题聚类视角,运用LDA主题模型,结合观点分离与观点摘要集成算法,提出了网络评论主要观点识别、深度观点发现等方法。(3)将基于观点词的一般挖掘与基于主题的深度挖掘相融合,通过领域知识进行互补,构建了观点-领域知识-主题多库融合的网络评论观点知识发现模式。

【Abstract】 Nowadays,there exist objectively two fields of public opinions. One is society publicopinion field which mainly refers to mainstream media,including newspapers,radio andtelevisions. The other is the private opinions platform which is based on the Internet and theapplications of Web2.0. During the new environment of Web2.0,there are many new formsof the communication added to the public opinion platform. The reviews on the platformhave increased rapidly due to both of the original news commentary,BBS and the newcommunications forms. The new communication methods include the tools as newsaggregation (RSS),Wikipedia (Wiki),instant messaging (IM) as QQ,micro-blog,podcast,taobao and ebay and so on. Recently the Internet users have entered a developmentplatform in which mobile phone has played the most important role. Micro-blog andcommunity also become the sources of online comment views. The new features of theinformation,which are timeless,openness,interactivity,thoughtful and grass-rooted hasbrought many changes on many aspects. People’s life has been influenced by these features.Moreover,it also changes the public opinion evolution information and the mechanism ofpolymerization. Besides the space of the public opinion spreading also has connected withthe new information features.Under the environment of Web2.0,it becomes important to get ideas as well as getaccess to information,while it becomes much more difficult to get valuable informationideas. There are two reasons for illustrating this phenomenon. One is that different peoplewould have different opinions in different ways. And the review ideas often mix opinionswith positive and negative aspects. It will cost a lot time and energy to capture accuratereviews information. The other reason is that the comment information from the publicopinion platform with Web2.0has been polluted by large Web reviews. This kind ofinformation is intermingled with good and bad,multifarious as well as stemming. Whereasthe traditional Web public opinion analysis technology,which mainly focused on the webpages and BBS,has limitations to make analysis about the application of Web2.0. The failureto get authoritative information affects the Web reviews and analysis of information.Therefore,it is necessary for us to carry out the research study about the Web2.0Webreviews information. It will help us to make better exploration to the hidden information inthe Web. Moreover,it will also provide support for making decisions and prediction of thefuture strategies. At the same,it will also enrich the Web reviews the theoretical systeminformation analysis.With Web2.0application of the public opinion platform as the logical starting point,thisthesis has carried out a deeper system research study which focuses on the Web reviews analysis models,reviews on the Web information theory,technology and methods from theperspective of topic clustering. This thesis mainly uses the methods of the integrated use oftext mining,opinion mining,knowledge discovery,LDA topic model,ontology learning,and methods of visualization technology.The main researches this thesis has done are as follows:(1) This thesis has made a comprehensive analysis of the reviews from home and broad andit includes hotspot and frontier as well as application progress of this selected topic.(2) Opinion mining from unstructured text is based on discovery method of aspect-sentimentrelationship,making use of semi-structured explicit view as the extracting key of aspect,sentiment polarity and relationship in comments objects,and constructing opinionknowledge base. So we can solve the problem of context-aware,which will help the wholework of opinion mining.(3) The method of web reviews discovery based on LDA topic clustering model is putforward,including similar comments text clustering and main opinion recognition and deepviews judgment.(4) From a cognitive perspective,analysis of the discovery regulation of web reviews forimplicit cognition and the discovery mode of web reviews based on LDA and opinionmining is conducted. In the educational field network review opinion mining is used as acase study,which provides valuable reference for the research of its application.The innovations of this research include the following three aspects:(1) The opinion knowledge base based on ontology is constructed,and opinion miningmethod based on it is put forward,which will help resolve the problem of context-aware andimprove the dynamic expansion of domain dictionary.(2) Based on the perspective of topic clustering,using LDA topic model,combining theideas of algorithm of opinion separation and integration,we proposed the method of mainopinion recognition and deep views judgment.(3) Based on the general mining of opinion word and depth mining based on topic,byincorporating domain knowledge complementary,the knowledge discovery mode of theview-knowledge-topic multi-base integration from web reviews is constructed.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2014年 09期
  • 【分类号】G206;G358
  • 【被引频次】1
  • 【下载频次】1516
  • 攻读期成果

