

Research on Method of Mining Knowledge Elements Based on Citation Chain

【作者】 赵火军

【导师】 温有奎;

【作者基本信息】 西安电子科技大学 , 情报学, 2009, 硕士

【摘要】 一篇文献的知识元是隐含的,而且没有统一的标准,怎样定义文献的知识元并有效得提取文献的知识元已日益成为研究者关心的话题,也是进行文本挖掘的一个重要研究方向。本文采用引文关联的方法来提取文献的知识元,使得可以绕过文献这个门槛,而深入文献内部,对文献内容的结构进行评价,从而使对文献的评价由传统的以文献为单位提高到以文献知识元为单位的深度。首先,本文综述了国内外研究现状,分析了引文索引的规律,在此基础上提取出了相关联的文献特征句子,并根据句子相似度计算方法,提取所对应的参考文献中的特征句子,分别存放在数据库的两个表中。其次,根据自定义规则抽取出了特征句子中的三元组,表示成本体,同样分别存放在数据库的另外两个表中。再次,本文提出了一种基于双权重的本体相似度计算方法,用于比较文献知识元和对应的参考文献知识元之间的相似度。接着,按照上述步骤,以具体例子进行了说明,并给出了试验结果。最后,总结了本文的创新工作,分析了本文存在的不足之处,探讨了今后的工作。本文的创新工作主要表现在:(1)本文在引文链的基础上,提出了用引文关联的方法来提取并关联文献知识元的思想,改进了原先只能对文献进行评价的不足;(2)本文提出了一种基于双权重的本体相似度计算方法,可以快速、准确地计算出文献知识元和相应的参考文献知识元之间的相似度。

【Abstract】 Knowledge units in a literature is implied, and there is no uniform standard, how to define and accurately extract the knowledge units which imply in a literature is becoming increasingly a topic by researchers, it is also an important research direction in doing text mining. This paper adopts the method of citation relevance to extract knowledge elements in literatures base on citation chain, it can bypass the threshold of literature and go deep into internal literature to assess the structure of the contents, so that it can improve the depth in assessing literatures from traditional regarding literatures as unit to regarding knowledge elements of literatures as unit. First this paper summarizes the present situation in this field at home and abroad, analyzes the rule of citation index and extracts the characteristic sentences of related papers, then extracts the characteristic sentences of corresponding references according to sentence similarity calculation, respectively stored in two database tables. Second extracts the triples of the characteristic sentences according to custom rules and expresses them by Ontology, also respectively stored in two database tables. Third this paper presents a method of Ontology similarity calculation based on double weight, for comparing the similarity of knowledge elements in scientific literatures and corresponding references. Fourth gives an illustration by specific examples according to the above-mentioned steps and gives out the results. Last this paper summarizes the innovation work, analyzes the deficiencies in this paper and discusses the future of work.The innovation work are mainly about: (1) This paper presents a method of mining and relating knowledge element based on citation chain, improving the deficiencies of evaluating a scientific literature as a unit in the past; (2) This paper presents a method of Ontology similarity calculation based on double weight, it can fast and accurately calculate the similarity of knowledge elements in scientific literatures and corresponding references.

【关键词】 引文链知识元本体相似度
【Key words】 Citation chainKnowledge elementOntologySimilarity

