节点文献

BBS主观倾向分析

Tendency Analysis of Subjective Comments Towards BBS

【作者】 唐果

【导师】 陈宏刚;

【作者基本信息】 西南大学 , 计算机软件与理论, 2010, 硕士

【摘要】 在社会主义民主政治以及和谐社会发展的过程中,论坛BBS已成为人们交流意见和发表评论的重要平台。为了及时采集BBS舆论信息,掌握BBS热点话题评论内容的观点、态度和情感倾向,监管和净化BBS网络环境,为党政机构和相关部门提供民众意见倾向,以便快速和科学的决策,而BBS主观倾向分析则是BBS舆论监管的重要手段之一。在国外,英国科波拉软件公司的“感情色彩”软件能判断媒体文章对政党政策或网络产品评论信息所持评价态度和情感倾向。在国内,方正智思舆情监测分析系统帮助监管部门对网络舆论信息进行评估、分析和规划舆情内容,形成舆情预警信息。基于机器学习和语义模式的BBS文本倾向性研究都是将文档看作是词或模式的集合,根据计算或查找这些短语或模式的倾向性值,将计算结果累加得到整个将要判断评论性文档的BBS文本倾向性值;然而并没有将观点评价对象和对应的极性情感倾向进行细化和对应并且忽略了句子语法结构中主谓与动宾结构间的连动关系,导致BBS热点的主题词对应的情感词极性倾向判断偏差和BBS文本倾向分析的不准确。BBS主观倾向性分析的数据获取具有复杂性和多样性,常常与讨论的热点主题相关,具有随意性、广泛性、领域独特性和实效性。因此,本文首先对BBS主题的观点评价对象和相应的极性倾向进行细化与对应;然后结合极性情感词典、基于语法结构的依存句法分析Parsing以及主题极性识别算法进行BBS主观倾向分析,利用一种改进的基于上下文的倾向分析方法计算主题极性倾向值;最后进行极性主题、焦点主题和敏感主题分析和发现,利用倾向离散度的时间变化来发现主题走势,并进行对比实验验证在主题识别和对应极性倾向判断的准确率方面上本文的BBS主观倾向分析方法具有更高的有效性和可行性。主要工作:(1)利用Html和DOM抽取非结构化的BBS文本信息,进行禁用词过滤后完成中文分词预处理并以XML方式存储。(2)提出基于极性情感词典、依存句法分析技术Parsing和主题极性倾向识别算法的方法,分析主题词和对应极性情感词的极性倾向以进行BBS主观倾向分析。建立与整合正负情感词典和否定词典,计算句子的倾向值提取BBS评论内容中具有情感描述项的主题倾向句,并利用主题极性倾向识别算法计算基于上下文的词语极性倾向值。(3)提出一种改进的计算上下文极性的方法,通过添加主题识别标记和主谓与动宾结构之间的连动关系,弥补SBV(Subjective-Verb,语法中的主谓结构关系)极性传递算法主题词判断错误和极性词极性倾向判断偏差的问题。(4)进行BBS主观倾向关键点分析以发现极性主题、焦点主题和敏感主题;定义倾向离散度、聚焦度和敏感度,并通过倾向离散度的时间变化来分析和发现主题趋势。(5)通过对比实验验证在主题识别和对应极性倾向判断的准确率上,本文的BBS主观倾向分析方法具有更高的有效性和可行性。

【Abstract】 In order to promote socialist democracy and harmonious development of society, the network has turned into an important stage for exchange of views and comments. BBS is now. becoming a communication platform to express speech of freedom, their personal views and attitudes. BBS contains a mass of information on public opinion. To enable the Government to quickly collect information on public forums, to timely grasp the public views, attitudes and subjeptive comments about the most concerned topics during various periods and to monitor BBS public opinion information, then a correct and scientific decision-making will be made. Text tendency analysis has developed greatly in the whole identification of attitudes and subjective tendency analysis of the subjective.comments texts or product reviews. Overseas, Emotion Analysis software made by British Coppola Software Company can determine a newspaper article on whether party policy or online products commentaries hold a positive attitude or a negative attitude. In China, public opinion monitoring and analysis system of Fangzheng helps monitoring department with public opinion information assessment, analysis and planning so as to provide early warning information.Whether it is based on semantics or machine learning of BBS text tendency analysis, a word document or pattern set will be handled, calculate and find all preference values of these phrases or patterns, and then the results will be added together to determine the BBS text tendency polarity value. The theme evaluation object of BBS and its corresponding polarity tendency would not only be refined, but also the relationship between Subject-Predicate and Verb-Object structure is ignored. So combined with polarity dictionary and dependency parsing and corresponding algorithm, the tendency analysis of subjective comments towards BBS will be carried out, and then we establish some mathematical models for the theme trend detection and analysis. Main tasks: (1)Information extraction technology based on Html and DOM tree is utilized to extract BBS unstructured text, and then disable word filtering and Chinese word segregation is pretreated and will be stored in XML.(2)Combined with a sentimental dictionary and dependency parsing technology, using polarity tendency identification algorithm to identify topic words and the polarity of their corresponding emotional words, so as to achieve to calculate and analyze polarity tendency of topic orientation sentences of the subjective comments towards BBS. Through the integration of positive and negative sentimental dictionary and negative dictionary, we could calculate the tendency value of the topic tendency sentence towards BBS comment information and calculate the context polarity tendency value and use the tendency identification algorithm to get the theme topic polarity tendency value.(3)A improved method computing and analyzing context polarity value is proposed that we are able to add identification mark of keywords and relationship between Subject-Predicate and Verb-Object structure to improve SBV(Subjective-Verb, Subject-Predicate structure of the syntax) algorithm, in order to make up for the disadvantages of keywords judgment error and polarity tendency. (4)Based on the tendency analysis against BBS, polarity theme model, focus theme model and sensitive theme model are established to find the theme trend. (5)Comparative Experiments validate that the BBS tendency analysis approach proposed in this paper has higher validity and feasibility.

  • 【网络出版投稿人】 西南大学
  • 【网络出版年期】2011年 05期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络