节点文献

基于JavaEE平台与Lucene的信息文档搜索引擎系统的设计与实现

Design and Implement of Information Document Search Engine System Based on JavaEE Platform and Lucene

【作者】 桂许军

【导师】 何枫;

【作者基本信息】 西南交通大学 , 计算机应用技术, 2011, 硕士

【摘要】 随着互联网的日新月异的发展,网络应用已涉及到各大企业以及文献机构的方方面面,因而因使用互联网无时不刻都在产生着惊人的数据和信息。同时也因企业以及各大机构本身各个业务环节也会产生大量的信息文档,而这些信息文档中很大的部分属于异构文档,极其不利于检索及管理。为了极大程度提高信息资源的共享率和利用率,需要一套高效的检索系统。本文结合行业搜索引擎的特点以及当前的实际需求,采用了基于JavaEE平台,使用Java语言以及结合设计模式思想采取多层架构技术,同时融合了Ajax等当前的流行技术,完成了对信息文档搜索引擎系统的开发。论文首先介绍了课题的研究背景与意义,并分析了当前的信息文档检索的现状以及未来的发现方向。接着对搜索引擎系统所要用到的相关技术与基本原理进行了阐述与分析。然后从信息采集、索引建立、信息检索等多方面初步的分析了信息文档搜索引擎系统的总体需求,以及系统的功能与数据需求。因为系统是面向用户的,因此运用了面向对象思想的UML(统一建模语言)分析并给出了系统的用例图及总体架构图。其次,基于需求分析的基础上,划分并设计了系统的各个核心版块以及功能,使用流程图详细的说明了各大核心版块的处理流程。再次,同时也运用了UML设计了系统各个版块部分的静态结构图,结合静态结构图以及对象实体设计了系统的数据库。最后,对系统的各个版块经行了详细的设计与实现,给出了这些模块的时序图以及运行图。该系统具有简洁直观的用户界面,人性化操作,使用简单便捷,能较好的满足用户的检索需求。

【Abstract】 With the rapid development of Internet, network applications have been involved in various aspects of large enterprises and document institutions, so the use of the Internet is everywhere incessantly, which makes the data and information increase faster. Meantime enterprises and document institutions will also produce a lot of information documents in their service links, and most of these information documents are heterogeneous document which are unfavorable to retrieval and management. In order to improve the sharing rate of information resources and utilization percent, we need an efficient retrieval systemConsidering the characteristics of the search engine industry and the current actual demand, this paper based on JavaEE platform, using Java language, combining the methods of design patterns and taking multi-architecture technologies. And also integrate some popular technologies in current, such as Lucene and Ajax. After all, my paper tries to complete the empolder about search engine of information document.Firstly, this paper introduces the research background and significance, and analyzes the current status of the information document retrieval and the direction of future. Then described and analyzed the relevant technology and basic principles about the search engine system. After that, doing a preliminary analyze about the overall system requirements of information search engine and system functions and data needs, with information collection, indexing, information retrieval and so on. The system is user-oriented, so I use object-oriented methods of UML (Unified Modeling Language) to analyze, and make the system’s use case diagrams and its overall charts. Secondly, based on demand analysis, I divided and designed the core columns and features of the system, and illustrate the management procedure of major core forum with flow chart. Again, I designed static structure diagram of each column in the system with UML, and though combining with the static structure and the physical design of the system object database. Finally, in my individual views, I designed and implemented each column of the system, and finish the timing diagram of these modules and running chart.The system has simple and intuitive user interface, user-friendly operation, simple and convenient usage experience, it can meet the needs of the user’s search better.

【关键词】 搜索引擎LuceneAjax网络爬虫JavaEE
【Key words】 Search engineLuceneAjaxEngine web crawlerJavaEE
节点文献中: 

本文链接的文献网络图示:

本文的引文网络