节点文献

Web多文档自动文摘研究

Research of Web Multi-document Automatic Summarization

【作者】 付红艳

【导师】 张文燚;

【作者基本信息】 哈尔滨工程大学 , 计算机软件与理论, 2010, 硕士

【摘要】 政党外交辅助决策支持系统是一个智能聚类搜索系统,通过输入主题词能搜索出同主题的大量文档集合,并给出文档自动文摘的内容,方便用户快速浏览信息,及时准确地做出正确决策。自动文摘是此系统的一个组成部分,为了进一步优化系统,提出了本课题的研究。Web多文档自动文摘旨在呈现全面、简洁的信息给用户,节省用户的浏览时间。目前,多文档自动文摘主要有两类方法:一是把整个文档集合中的句子按照权重大小统一进行排序,根据压缩比依次选择文摘句;二是把文档集合划分成几个局部主题,然后从不同的局部主题中选择文摘句。鉴于用户对文摘全面、简洁的要求,本文重点研究了第二类方法。本文重点研究了多文档自动文摘的几个方面:相似度计算、局部主题划分、文摘句优选、文摘句排序。本文通过对以上几个方面的深入研究、分析,改进了基于局部主题划分的文摘句优选及排序方法,主要包括:改进了词语语义距离的计算方法,提出了欧氏距离与语义距离融合的句子相似度计算方法;优化了k-中心点算法,基于句子密度智能地发现种子点和类别数;改进了局部主题打分方法和句子信息覆盖率判定方法,从而优化了迭代优选文摘句策略;在二层排序方法的基础上提出了改进的三层排序法。最后将算法应用到Web多文档自动文摘系统中,并对算法进行了实验及结果分析。

【Abstract】 The Political Party Diplomacy Auxiliary Decision Supporting System is an intelligent system for clustering-searching, which can find the massive document-sets about the same subject by inputing keywords, and show the contents of automatic summarization so that the user can glance over the information fast and make the correct decision promptly. The automatic summarization is an important part of this system, and a research on this subject is proposed to further optimize the system.The Web multi-document automatic summarization is for the purpose of presenting the comprehensive and concise information to the users, which has saved the users’browsing-time.At present, two kinds of methods have been used about the multi-document automatic summarization.First, sorting unifily the entire document-set’ s sentences according to the weight, and choosing the summarization sentences in turns according to the compression ratio; Second, dividing the document-set into several partial subjects, then choosing the summarization sentences from the different partial subjects. In view of the fact that the users require comprehensive and concise summarization, this paper has studied the second kind of methods with emphasis.This paper has studied several aspects of multi-document automatic summarization with emphasis: similarity computation, partial subject division, summarization sentences optimal selection, and summarization sentences sorting.This paper has improved the summarization sentence optimal selection and sorting method based on the partial subject division through the deep research and analysis on above several aspects. It mainly includes: Improved the computational method of semantic distance between words and words,and proposed the computational method of sentence similarity based on euclidean distance and semantic distance;Optimized the k- central point algorithm which can discover the seeds and category number based on sentence density intelligently; Improved the scoring method on partial subject and the judgement method on sentence information coverage fraction, thus optimized the iterative and optimal summarization sentence selection strategy; Proposed the improvd three -rank ordering method based on two-rank ordering method. Finally, applied the algorithms in the web multi-document automatic summarization system, and has carried on the experiments and the result analysis about the algorithms.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络