节点文献
Deep Web查询接口匹配技术研究
Research on Technology of Deep Web Query Interface Matching
【作者】 曹庆皇;
【导师】 鞠时光;
【作者基本信息】 江苏大学 , 计算机应用技术, 2009, 硕士
【摘要】 Internet技术的飞速发展便得web数据厍得到了广泛应用,这些数据库隐藏在查询接口之后,用户只能通过本地查询接口提交请求才能获得其中信息。这些信息无法被搜索引擎通过超链接检索到,称为Deep Web信息。由于Deep Web海量的信息,构建一个Deep Web信息集成系统显得尤为重要。在Deep Web信息集成系统中,将Web数据库按领域分类,为每个领域建立一个统一查询接口。通过对统一查询接口提交查询,就可以同时向多个本地查询接口发送请求。将统一查询接口的请求映射到各个本地查询接口,需要解决查询接口匹配问题。查询接口匹配是Deep Web信息集成系统的基础。针对现有方法不能有效处理查询接口复杂匹配问题,本文提出一种新的匹配方法,利用正相关关联挖掘发现潜在的成组属性组,并将成组属性作为单个属性,对具有相同语义的属性进行语义聚类,达到匹配目的。最后实现一个面向图书检索领域的Deep Web信息集成系统。主要研究工作包括:(1)提出一种利用关联挖掘思想生成成组属性的方法。针对属性相关度计算不精确问题,设计了一种基于互信息的属性相关度度量标准,该标准能够体现成组属性的特点,并能解决属性稀疏性问题和高频率属性问题。另外,为了提高算法效率,提出“属性矩阵”概念,所有的计算都在仅含有0和1的矩阵上进行,复杂的概率计算转为简单的与运算,有效提高效率。(2)提出一种采用语义聚类思想生成同义属性的方法。借助语义网计算属性间的语义相似度,同时为了弥补部分属性语义信息不足问题,在计算属性相似度时,加入数据域相似度。通过语义相似度和数据域相似度的加权计算,提高属性相似度计算的精度。(3)设计并实现一个面向图书检索领域的Deep Web信息集成系统,并将匹配技术在系统中的应用作了分析。另外所有领域相关的信息都存放在配置文件中,通过改变配置文件能够快速搭建一个面向新领域的信息集成系统。
【Abstract】 With the rapid development of Internet technology, web databases have been used widely. These databases are hidden in the local query interfaces. User must use the local query interface to submit request to get information. Deep Web means the information in database which can’t be indexed by the Search Engineer. Recently, Deep Web Data Integration System has been paid more and more attention because of its huge capability of information, high data quality and well formatted structure. Deep Web Data Integration System divides the web databases by domain, and establishes a unique query interface for every domain. User can submit request through the unique query interface to send request to every local query interface at the same time. There exists a query interface matching problem while mapping request between the unique query interface and local query interface.Query interface matching is prerequisite to data integration. This paper first focuses on technology of query interface matching, and proposed a new matching method which uses association mining mines positively correlated attributes to form potential group attributes, and finds synonym attributes by clustering on the base of existed methods, then implements a Deep Web Data Integration System in the field of book. The main work is summarized as follows: (1) Design a new correlation measure based on Mutual Information, and use matrix to implement it. The measure can reflect the character which group attribute often occurs at the same time and appears alone rarely, and solve the problem of sparse and high-frequency attributes. Besides, propose the attribute matrix which only contains 0 and 1 to improve efficiency.(2) Add semantic and domain component to computation of attribute similarity. Use semantic net to compute the most precise semantic similarity. Besides, calculate domain similarity to improve the precision of attribute similarity.(3) Design and implement a data integration system in the field of book. The principle is that make sure the system has no correlation with domain. Everything about domain is stored in a configure file which can be modified while changing application domain. It helps to establish a new system quickly. A data integration system on Book domain is accomplished at the end of this paper.
【Key words】 complex matching; Deep Web; association mining; clustering; semantic net; mutual information;