

Research and Implementation of Intelligent Search Engine Based on Semantic Web Technology

【作者】 潘宁

【导师】 刘晓鸿;

【作者基本信息】 北京邮电大学 , 计算机科学与技术, 2009, 硕士

【摘要】 互联网作为全球最大的数据信息库,随着其覆盖范围和领域的不断扩大,存储在互联网上的数据也在海量增长。搜索引擎帮助用户从海量的数据中抽取出潜在的、有价值的信息。在针对特定领域的垂直搜索引擎的基础上,更加高效的智能化的搜索引擎的研究也就成为发展的必然。本文通过语义Web技术为搜索引擎注入基于知识和本体概念的自然语言理解能力。搜索引擎构建于知识库之上,通过语义化的索引器构建集知识与互联网数据为一体的索引库。用户的查询经过分词、语义推理和查询扩展处理,以规范化形式在索引库中进行搜索。搜索结果综合了Page Ranking算法、词义语义分析因素、检索内容与网页特征相关性三个要素进行排序得到。采用这种方法的搜索引擎弱化了用户表达模糊对搜索的影响;克服了关键字机械匹配的缺点;使得事物间不再孤立的存在,而是以相互关联的形式表现出来;同时能够达到知识的系统化整合。

【Abstract】 Internet is the largest database of the world. More and more fields are covered, the information contented by the internet are growing constantly and rapidly. Search engine will help user to search valuable and underlying information. Based on the fields focused vertical information retrival techniques, the research on more efficient and intelligent search engine become to inexorable trend.This paper introduces a method, which gives the ability of nature language understanding to search engineby using semantic web techniques. The search engine built on knowledge base. It use semantic indexer to construct an index database which contain knowledge and web data of the web. User’s query are processed through tokenizing, semantic reasoning and query extention, result in a standard form and be used to query index database. Search results are ranked by according to Page Ranking algorithm, semantic analysis factor and the relativity of web page’s characteristic. By this method, the search engine weakens the affection of vague expression, overcome the shortage of mechanical keywords matching process; Things no more isolate but appear in relationship; Knowledge is integrated in system approach.

【关键词】 搜索引擎语义网本体HadoopLucene
【Key words】 Search EngineSemantic WebOntologyHadoopLucene