节点文献

中文自动文摘关键技术的研究与实现

Research and Implementation of Chinese Automatic Abstracting

【作者】 乔小斐

【导师】 陈平;

【作者基本信息】 西安电子科技大学 , 计算机软件与理论, 2010, 硕士

【摘要】 现有中文自动文摘技术存在原文内容覆盖不全面以及信息冗余的问题。针对上述问题,本文开展了相关的研究工作。结合已有的“统计全切分中文分词系统”,本文首先提出了基于通用分词词典的最长组合模式逆向匹配算法来修正通用分词词典分词粒度过细的问题,并在分词的基础上进行特征计算与筛选,将文本以特征词表示。此后设计了基于形式特征的语句加权函数应用于分句过程,并且结合最大边缘相关(Maximal Marginal Relevance, MMR)思想提出了应用于自动文摘的MMR公式以降低文摘的冗余,并将该公式作为语句评价标准,据此给出了一种新的文摘句选取算法。最后本文阐述了一个中文自动文摘系统的设计与实现,并通过实验证明由本系统抽取的文摘具有良好的完备性和低冗余性。

【Abstract】 There has been a rapid development in Chinese automatic abstracting in last 20 years. However, limitations still exist in automatic abstracting techniques, which represent as the non-completeness and high redundancy of the automatic abstraction.Specified study has been made in this paper for the correction of the limitations. At the beginning of the paper, a reverse maximum matching method based on the universal segmentation dictionary for the longest word-combination is proposed to modify the fine grained segmentation, followed with calculations and filter of term words. Then the weighting function of the sentence is summarized with the combination of other researchers’ study and the text feature characters, which is applied in the sentence segmentation algorithm. An MMR equation has also designed based on the maximal marginal relevance theory. It is used in a new abstraction summarizing method in order to reduce the redundancy. In the end of the paper, a Chinese document automatic abstracting system is designed and implement. Experiments indicate that the automatic abstraction made by the system has a fine quality with completeness and low redundancy.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络