节点文献

自适应网络信息获取服务技术研究

Research on Adaptive Techniques for Web Information

【作者】 刘康苗

【导师】 陈纯; 卜佳俊;

【作者基本信息】 浙江大学 , 计算机科学与技术, 2008, 博士

【摘要】 网络技术的发展带来了可获取信息资源的极大丰富,但是网络资源的无序、良莠不齐等缺点也给用户获取网络信息带来了困难。网络信息获取服务是指在互联网上,针对个人用户的网络信息需求,以现代信息技术为手段,向用户提供所需的互联网信息产品及服务,其服务模式包括信息拉取和信息推送。自适应网络信息获取服务技术,则是根据用户需求、信息源特征、系统负载等因素,自适应地动态调整自身行为,高效、人性化地提供高质量的信息。准确、全面地感知用户需求,是实现网络信息获取服务的基础。网络用户既是网络信息资源的利用者又是提供者,因此可以通过分析用户的浏览内容、行为和发布的信息等来获取用户需求。获得用户需求后,如何在浩瀚的网络信息资源中筛选出相关的信息,并以更人性化的方式展现给用户,是网络信息获取服务成功的关键。此外,用户对信息获取的时效性通常有较高的要求,如何保障信息获取系统的性能也是网络信息获取服务的重要研究内容之一。为解决上述问题,本文首先提出了一种基于查询歧义性衡量的自适应信息拉取技术。对用户请求进行歧义性衡量,根据其歧义性自适应地决定结果的展现方式;在结果筛选和展现方面,分别提出了多特征融合排序算法和聚类算法;并在互联网颇具代表的新兴资源:多媒体信息(以图像为例)和更新频繁的动态资源(以博客为例)上得到了验证。其次,本文针对网络活动中的信息发布者和信息浏览者各提出了一种基于个性化建模的自适应信息推送技术:对于信息发布者,以当前网络流行的博客这一个性化信息发布平台为研究环境,提出了一种利用博客文章对用户进行长短期兴趣建模的方法,并对博客空间进行社群划分,实现了兴趣相似好友的推荐;对于信息浏览者,利用用户当前浏览网页的内容作为用户个性信息的表征,提出了一种基于情感和主题分析的上下文广告推荐技术,使推送的广告不仅主题相关,而且与网页内容中潜在的用户情感相符合,从而更具针对性。接着,针对网络信息获取服务在性能、可扩展性等方面的需求,以信息拉取服务的典型应用——搜索引擎为切入点,提出了一种具有较好可扩展性的混合型分布式索引组织策略(Loc-Glob)。并在Loc-Glob索引组织策略之上进行性能优化:基于索引词负载及动态变化查询流,重新分布和冗余索引;基于索引服务器的实时系统负载,实现查询路径的自适应优化。基于上述研究,本文设计并实现了一个采用自适应技术的博客空间信息获取原型系统,提供了博客搜索引擎、博客好友推荐、广告推荐等多种应用服务,验证了本文针对信息拉取和信息推送两类服务模式提出的多项自适应技术的可行性。文章最后对本文的研究工作进行了总结和展望。

【Abstract】 The rapid development of web technology greatly enriches accessible information resources. However, these resources come with some inherent insufficiencies such as disorder and mixture of junk, making user acquisition of information difficult. The Web Information Acquistion Service (WIAS) means to provide users with Web information products and services to meet their personal network information needs through modern information technology, with pull and push being the main two strategies. Adaptive techniques for WIAS adjust the service behavior to users’ information needs, information source characteristics, system load and other factors dynamically, and provide high quality information efficiently and humanizedly.Accurate and complete understanding of users’ information needs lays foundations of WIAS. Web users are simultaneously consumers and producers of Web information, therefore it is feasible to obtain users’ needs through the analysis of their browsing content, behavior and also published information and etc. Once the informaion needs are obtained, retrieving relevant results from the vast amount of Web resources and then presenting them in a more humanized style are keys to the success of WIAS. Besides, as users usually require high time validity on information acquisition, ensuring the performance of WIAS shall also be a vital part of the research on information acquistion.To address the above issues, an adaptive information pull technique based on the measurement of user requests’ ambiguity is firstly proposed. The demonstration styles of pulling results are decided adaptively according to the quantified ambiguity of user requests. For result filtering and demonstration styles, a ranking algorithm and a clustering algorithm based on the combination of multi-features are proposed correspondingly. These two algorithms are validated using two kinds of respresentive emerging Internet resources: multimedia resources (images for example in the paper) and dynamic resources with frequent updating (blog for example in the paper). Secondly, an adaptive information push technique is proposed based on user modeling for information publishers and browsers. Blogs, the popular personal information publishing platform, are taken as the research environment for information publishers and a modeling approach using blog posts is proposed, based on which communities of bloggers with similar preferences in the blogspace are partitioned and recommended as friends. Meanwhile, for information browers, current browsing content is regarded as the evidences for users’ profiles and a contextual advertising method based on sentiment and topic analysis is proposed, which ensures the promoted advertisments are not only topic relevant but also conformable the underlying users’ attitudes and therefore makes them more targeted.After then, we propose a hybrid strategy to distributed index organization in search engine (a typical information pull application), which named Loc-Glob. It is both high performance and scalable. Some optimization strategies are proposed on Loc-Glob further. To smooth the workload across index servers, index is re-distributed and duplicated based on the analysis of index terms workload and user query streams. Query path across index servers is also optimized based on the real-time workload to improve system load-balancing level.Based on the above work, a blog information acquistion prototype system adopting adaptive techniques is designed and implemented. This system provides novel applications such as blog search engine, blog friends recommending, advertisement promoting and etc. to validate the feasibility of the adaptive techniques proposed in this paper for the two types of information acquistion services.Finally, conclusions and future work are presented.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2008年 09期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络