

The Study of Grid Information Retrieval Model Based on Ontology

【摘要】 目前在国内外学术界,有关网格技术的研究项目层出不穷;由于社会的推动,网格技术的开发和应用也开始受到企业,如GOOLE,IBM,微软等大型企业的大力支持,其发展方向与Web服务技术紧密结合,开始应用到商业计算领域中,其中一个重要的领域就是信息检索领域。信息检索经历了手工检索、计算机检索到目前网络化检索、智能化检索等多个发展阶段。检索的工具也由通用的搜索引擎如Google,Yahoo发展到如今应用领域高度专业化的检索工具。网格信息检索是信息检索与网格计算技术相结合的新领域。在网格信息检索中,网格计算的工具应用到了信息检索领域,为信息检索提供了多种优质服务,如资源管理、查询调度等;同时,由于网格环境具有异构性、可扩展性、动态自适应性等特点,这些特性决定了网格中信息必然有格式多样、异构性强、信息量大、信息内容动态变化和信息源分布自治等特点。网格信息检索给人们带来方便的同时,也带来了两个突出的问题。首先网格环境下资源的异构问题。网格能够充分的吸纳各种计算资源,并将它们转化为一种随处可得的、可靠的、标准的同时还是经济的计算能力。然而这些资源通常属于不同的组织,跨越不同的领域,满足各自不同的资源约束。由此可见,网格要面对的是一个庞大的彼此异构的资源服务群。另外,一个突出的问题就是网格环境下的服务查询问题。在已有的网格系统中,用于完成服务注册、发布、查询以及调用的都是一些成型的信息系统,这些系统在这方面的问题尤为突出。Globus的MDS和UDDI就是其中最典型的两个例子。MDS和UDDI都只支持简单的查询方式,并且不能提供富有表现力的服务描述能力。为了提高服务发现的效率和精度,目前出现了很多在网格中加入有关网格本体的研究。网格本体的所谓语义,就是在强大的知识存储的基础之上,让计算机能够理解我们的需求,帮助我们快速找到目标,从而实现根据用户提出的所需服务的功能描述准确的查找到服务。因此,如何快速有效地构建网格本体和提高现有本体匹配算法是网格信息检索问题的两个关键。传统的网格本体的构建,是依据领域专家构建网格本体库。但是由于被描述的网格本体的分布性、异构性、动态性、自治性等特点,这种方法实际上就是把领域专家的意愿强加到资源上,而且这种方式的低效和主观性无视了网格本身的特点。另外,在当前存在的基于语义研究中通常采用的都是扩展网格服务属性的方法,建立属性值之间等价及相容关系来提高匹配精度,其实质是一种改进的属性名到属性值的匹配,并不是真正网格本体之间的匹配。本文通过分析现有网格检索系统研究热点入手,选择了网格检索模型的构建作为突破口,在充分分析了现有网格检索模型优点和缺点的基础上,指出网格本体的构建的非自动化和网格本体匹配的模式算法的低效是目前网格检索中遇到的主要难题。针对现有网格本体构建需要领域专家参与、采用人工或者半自动的方法效率低下的问题,本文在利用现有本体技术的基础上,结合网格技术自身的特点,提出了通过LDAP(Lightweight Directory Access Protocol,轻量级目录访问协议)信息自动构造网格领域本体的方法,同时利用对网格领域本体的解析和UDDI网格注册信息的提取构造网格服务本体,在此基础上提出了一种网格本体自动构建的理论模型。在充分研究现有网格本体匹配算法的基础上,为了实现真正的本体与本体的匹配,本文提出了构造网格虚本体来形式化描述用户需求,即通过网格领域虚本体和网格服务虚本体的构造为网格本体的匹配做了铺垫。在网格本体匹配的研究中,作者依据自己提出的网格本体自动构建理论,实现了网格领域本体和网格服务本体的分别匹配。具体来讲,在网格领域本体匹配的研究中,本文提出了利用本体分层和图匹配来提高匹配的准确率,对其匹配模式和匹配算法都做了相应的改进和提高;在网格服务本体的匹配中,改进了现有的服务本体的匹配算法,从而实现匹配全过程的定量分析。针对网格用户检索需求越来越复杂的情况,单一网格本体的匹配已无法满足用户需要,这就需要用网格本体的组合来解决这个问题。本文把网格服务组合的若干问题用本体的思想进行了形式化描述,同时对网格服务本体如何组合、如何实现匹配都提出了相应的模型和算法。在对现有网格信息检索模型进行充分研究的基础上,作者结合自身提出的网格本体自动构建理论和改进的匹配模式,提出了基于本体自构的网格信息检索模型。在论文的最后部分,为了验证模型的有效性,作者依据现有的仿真系统对本文构建的网格信息检索模型的进行了验证。

【Abstract】 At Present, the grid technology research projects are on emerging in domestic and international academic; due to the social promotion, the development and application of grid technology is also beginning to be vigorously promoted by the enterprises such as GOOLE, IBM, Microsoft and other large enterprises. Its development direction in closely connected with the web services technology, beginning to apply to business computing field, and one of the important fields is the information retrieval. Information retrieval goes from manual retrieval, the computer retrieval to the current internet retrieve model, intelligent retrieval and many other stages of development. The retrieval tools develop from the general retrieval engines like Google, Yahoo into highly specialized applications related to specialized retrieval field.Grid information retrieval is a new area with a combination of information retrieval and grid computing technology. In the grid information retrieval, the grid computing tools are applied to information retrieval, and provide a variety of information retrieval services, such as resource management, query scheduling, etc.; and at the same time, the grid environment has the characteristics of heterogeneous, expansibility and dynamic self-adaptive. These characteristics determine the grid is bound to form a variety of information, heterogeneity, informative, dynamic content and information source distribution of autonomy and so on.Grid Information Retrieval brings convenience to people, but also brings two prominent issues: first, the issues of the heterogeneous resources in the grid environment. The grid can fully absorb a variety of computing resources, and convert them into a readily available, reliable, standard and economic computing power. However, these resources are usually belonging to different organizations, across different areas and meeting different resource constraints. Thus, the grid has to face a large group of resources services which has heterogeneous characteristic in each other. In addition, another prominent problem is the Grid service query. The existing grid system used to complete the service registration, release, query, and calls with some of the molding information systems. The MDS and UDDI in Globus is one of the most typical examples. However, MDS and UDDI support only simple query mode, and can not provide the expressive description of the service. Take the UDDI for example, UDDI supports only keyword and classified information, does not provide a strong service auto-discovery mechanism.In order to improve the efficiency and accuracy of service discovery, there are now many researches on how to add grid body on grid study. The so-called semantic Grid ontology is stored in a strong basis of knowledge, so that the computer can understand our needs, help us find the target quickly, in order to achieve the final functions that can locate the services quickly according to the user’s functional description of services. Therefore, the two key important of the grid information retrieved is that how quickly and efficiently build the grid ontology and improve the existing ontology matching algorithms.The traditional grid body building is based on the ontology domain experts to build the grid database. However, due to distributed, heterogeneous, dynamic, self-government and other characteristics of the depicted grid ontology, we can’t impose the domain experts’will onto the resources; furthermore, the inefficiency and subjectivity of the method ignore the characteristics of the grid itself. In addition, the current studies usually improve the accuracy between matching according to extended grid services based on semantic properties, and establish equivalence between attribute values and compatibility, and its essence is an improved a matching from property name to property value. These methods can not realize the matching between grid ontologies.Analyzing the hotspot of the existing grid retrieval system, choosing the building of grid retrieval model as a breakthrough, and on the full analysis of the existing retrieval model’s strengths and weaknesses of the grid, this paper points that the non-automated ontology construction and the inefficiency of the grid ontology matching algorithms is the key difficult problem of the grid retrieve. Nowadays grid ontology construction needs the help of domain expert, and it uses artificial or semi-automatic method to solve the problem, this will cause the inefficiency. On the full use of existing technology, and on the basis of the grid’s own characteristics, this paper proposes a new method which can construct domain ontology of the grid by the LDAP(Lightweight Directory Access Protocol) information. Meanwhile, this paper also presents a grid theoretical model which builds the grid ontology automatically, and all these depend on the build of grid service ontology taking advantage of the comprehension of grid domain ontology and the extraction of UDDI grid registration information. In this paper, on the full body of existing grid matching algorithm, in order to achieve the real match between different ontology, proposes to build a grid virtual ontology to give a formal description of the user’s needs, this means that we can do the groundwork for the matching of grid ontology through building the virtual ontology of the grid domain and grid services.On the research of grid ontology matching, this paper achieves a matching between grid domain ontology and grid services ontology respectively, based on the grid ontology automatically build theory which proposed by the author himself. Concretely, on the research of the matching of grid domain ontology, this paper uses the grid hierarchical and graph matching to increase the accuracy of the matching, and improves the existing matching patter and matching algorithms respectively and finally achieves the full match process of quantitative analysis.According to the more and more complex request of the grid user, the matching of single grid ontology can not meet user need, which requires a combination of grid ontology to solve this problem. This paper uses the ontology idea to descript some problems of the grid services formally, and at the same time presents the corresponding models and algorithms about how to combine of grid service ontology, how to match the services. On the fully research of the existing grid retrieval model, based on the combined body of the proposed grid automatically build theory and improve the matching pattern is proposed based on ontology-based information retrieval from the structure of the grid model. Under the last part of the paper, in order to verify the validity of the model, the author verifies his grid information retrieval model according to existing emulation system.

