节点文献

面向社会网络应用的关系抽取研究

【作者】 江超男

【导师】 丁晟春;

【作者基本信息】 南京理工大学 , 情报学, 2010, 硕士

【摘要】 自搜索引擎出现至今,大量信息扑面而来,但其中绝大部分均为重复信息。搜索引擎返回了过多的结果却依旧很难找到有用的信息。倘若有一种方法能将检索结果进行有效过滤,只抽取出人们所需要的关键信息,并以网络图的形式,而非仅仅是文字的形式呈现出来的话,则人们获取信息的效率必将会大大提高。基于此,本文针对社会网络领域中命名实体间的关系抽取问题进行了深入研究,尝试构建了一个面向社会网络领域的社会关系本体,在包含两个或两个以上命名实体的句子中抽取出相应的词语作为实体间的关系描述。同时还定义了一系列的SWRL规则,并结合Jess推理引擎对本体中的隐含社会关系进行了挖掘。在命名实体识别任务中,本文主要针对人名和机构名进行识别,借鉴了语义角色标注的思想,采用Viterbi算法,自动标注出句中各分词片段在人名或机构名中所代表的不同角色,同时根据人名和机构名的成词特点,总结出符合条件的构词规则,进行模式匹配,以得出最终的识别结果。本文对真实语料进行了开放测试,实验结果显示,该方法的召回率高于准确率,已接近70%。此结果验证了上述方法的有效性。在关系抽取任务中,本文综合本体工程中的七步法和迭进法,构建了一个面向社会网络领域,应用于互联网行业内企业的社会关系本体。同时设计了一系列的SWRL规则,将其与社会关系本体一并导入Jess规则推理引擎中,尝试通过本体严密的概念逻辑关系进行推理,以挖掘出实体间的隐含社会关系。最终得到(实体关系实体)的关系三元组并存入关系库中,大大精炼了信息内容,提高了人们获取信息的效率。

【Abstract】 We are surrounded by huge amount of information since the search engine appeared yet. But most of them are repetitive information. Search engine returned too many results to find useful information. The efficiency that people get the information they need will be greatly improved if there is a method could filter the retrieval results and just extract the key information. Based on this, Problems of relation extraction of named entities recognition in the field of social network have been mainly studied in this thesis. And an entity of social relations facing social network has been tried to established in the research. It means that relevant words were extracted from the sentences with two or more named entities to describe their relations. Meanwhile, with the help of reasoning engine Jess, a series of SWRL rules have been defined to reason and excavate the implicit relationships of entities.In named entity recognition task, personal name and organization name have been aimly identified in the study. The different roles in personal and institutional names represented by phrase segments in the sentences have been marked by using semantic role labeling and Viterbi algorithm. Then some proper word-formation rules were generated according to characteristics of word-formation to do pattern match, so as to get the final results. In the open test on realistic corpus, the result reflected that its recalling rate is better than precision, which is nearly 70%, and still have greatly improval space. The results show the effectiveness of the method.In the task of relation extraction, seven-step and iterative methods in ontological engineering will be integrated in this thesis to construct an social relationship ontology which faced social network and applied to the Internet business enterprisesis. The defined SWRL rules and the social relationship ontology have been imported into Jess rule reasoning engine to excavate the implicit relationship between entities, and eventually get the (entity relationship entity) relationship triad. The method greatly refined the content of information and improved the efficiency of information acquisition.

  • 【分类号】G350
  • 【被引频次】2
  • 【下载频次】326
节点文献中: