节点文献

Deep Web入口识别和个性化搜索研究与设计

Deep Web Entrance Recognition and Personalized Search Research & Design

【作者】 陈文

【导师】 晏立;

【作者基本信息】 江苏大学 , 计算机应用技术, 2010, 硕士

【摘要】 用户对Deep Web站点的访问主要是通过其在Web页面中提供的具有特定查询能力的接口来获取所需要的结果。为了帮助用户简单高效的查找Deep Web信息,就必须提供统一的查询接口,方便用户对多个Deep Web站点同时进行查询。而Deep web入口识别是整个DeepWeb集成搜索的重要组成部分,是搜索信息的来源和后续工作的前提,对整个Deep Web集成系统有着重要的意义。同时,大量的DeepWeb信息犹如浩瀚的海洋,为了使得Deep Web集成搜索获得的数据具有更高的使用价值,避免“信息过载”,就要对Deep Web集成搜索的结果进行处理,为用户提供个性化Deep Web集成搜索服务。本文重点研究了Deep Web入口识别和Deep Web集成结果显示的相关技术,给出了一种具有增量学习能力的PU主动学习算法并应用到Deep Web入口识别中以及一种面向Deep Web集成的个性化搜索方法,最后设计和实现了一个面向Deep Web集成的个性化搜索原型系统。本文主要研究的内容包括:(1)研究如何从不断增加的Web页面中判断出Deep Web入口并对其分类。针对初始正例样本较少并且不同类别反例获取困难的情形,给出了一种具有增量学习能力的PU主动学习算法,该算法使用三个支持向量机进行协同半监督学习的同时,利用基于网格的聚类方法进行无监督学习,当分类与聚类结果不一致时,引入主动学习来标记无标记样本。将该算法应用于Deep Web入口的在线判断和分类中,实验表明,该方法能提高新的类型的发现能力以及处理增量无标记样本的能力。(2)为了缓解Deep Web集成搜索结果页面中信息量过大,导致信息过载的问题,给出了一种面向Deep Web集成的个性化搜索方法。该方法利用Deep Web站点目录和用户调查表生成兴趣树,并根据用户反馈和成员Deep Web站点返回的参数等更新用户兴趣。针对不同的用户兴趣对页面进行过滤和排序,从而得到最终显示页面。实验结果表明,该方法优化了Deep Web集成搜索,使得用户感兴趣的个性化信息更加突出。(3)设计和实现了一个面向Deep Web集成的个性化搜索原型系统,并将上文给出的技术在该系统上的应用做了分析。实际应用表明,该系统可以取得较好的效果。

【Abstract】 The visits of users to Deep Web sites are mainly achieved through obtaining the desired results from the interfaces which have specific query ability provided in Web pages. It is necessary to provide a unified query interface which could make multiple Deep Web sites visited simultaneously to help users search Deep Web information simply and effectively. The recognition of the Deep Web entrance is an important component of the integrated search, the source of information searching and the prerequisite condition for the following works. And it is important for the entire integrated search system of Deep Web. Meanwhile, huge number of Deep Web information likes a vast ocean. For the sake of making the data obtained by integrated search of Deep Web have higher value and avoiding "Information Overloading", it needs to process the integrated search results and provide the intelligent services of personalized search for users.This paper mainly studies the techniques about the recognition of the Deep Web entrance and the display of the integrated results of Deep Web. In addition, a PU active learning algorithm which has incremental learning ability is proposed. We apply it into the recognition of the Deep Web entrance. Moreover, we put forward a personalized search method based on the integration of Deep Web. Finally, a personalized search prototype system based on the integration of Deep Web is designed and implemented.The main work of this paper is introduced as follows:(1) Study how to determine the entrances of Deep Web from the increased Web pages and classify them. For lowering the risk of lacking of initial positive samples and hardly obtaining negative samples of corresponding positive samples of different classes. A PU active learning method which has incremental learning ability is presented. This method employs three SVM classifiers in cooperative meta-supervised learning while unsupervised learning based on grid-based clustering is used. When the results of classification and cluster analysis are not unanimous, we introduce active learning to mark the unlabeled samples. The algorithm is applied to the online recognition of Deep Web interfaces and classification. Experiments show that the method can effectively improve the ability of identifying new classes and processing incremental unlabeled samples.(2) Present a personalized search approach based on the integration of Deep Web in order to solve the problem that information overloading due to the excessive amount of information in the integrated search of Deep Web. This method uses Deep Web directories and user questionnaire to generate interest tree and update user interest according to the feedback from users and the returned parameters from the members of the Deep Web sites. The pages are filtered and sorted according to different user interests so as to get the final displayed pages. Experimental results demonstrate that this method effectively optimizes the integrated search process of Deep Web, leading to the more prominent personalized information.(3) Design and implement an integrated personalized search prototype system of Deep Web. Moreover, we analyze the application of the techniques mentioned above to the system. The practical application shows that the system can has a good effect.

  • 【网络出版投稿人】 江苏大学
  • 【网络出版年期】2010年 08期
节点文献中: