节点文献

基于Web-Log的网页预测模型研究

Research on Web Prediction Model Based on Web-Log

【作者】 刘超慧

【导师】 安建成;

【作者基本信息】 太原理工大学 , 计算机应用技术, 2008, 硕士

【摘要】 随着互联网信息及用户的飞速增长,如何有效减少用户访问延时,提高网络服务质量是一个迫切需要解决的难题,缓存与预取技术是克服此难题的有效方法。但由于随着WWW上动态内容和个性化服务的比重日益增加,缓存技术对网络性能的改善已不再显著,而预取技术是缓存技术的一种有效补充手段,是突破缓存性能上限的最有效的方法,正越来越成为Web加速技术领域研究的热点。在网页预测方面Markov模型是一种简单而有效的工具,但现有的预测方法都有预测准确率和预测覆盖率存在矛盾,并且存储复杂度较高的缺点。因此,改进基于Markov模型进行用户浏览路径预测的方法成为Web日志挖掘的一个新课题。本文对国内外关于Markov模型浏览路径预测的研究现状进行了综合分析,指出了现有的预测方法存在的问题,并提出了改进方案,对如何改进基于Markov模型的预测方法这一问题进行了研究。论文首先介绍了Internet和WWW起源、发展及现状,提出了互联网所面临的问题及解决方案。然后阐述了Web数据挖掘的基本概念、分类以及数据预处理的一般方法和过程。介绍了常用的挖掘算法—关联规则算法,并针对其存在的不足提出了改进的算法。其次本文提出了新的用户浏览兴趣偏爱度,用传统的用户对网页兴趣偏爱度的方法,无法反应用户的真正浏览兴趣和网页的重要程度。新的偏爱度度量方法,不仅考虑了页面的浏览频度,而且引进了页面的访问时间和页面本身的大小,弥补了传统方法的不足,最后利用实验证明了该度量方法的有效性。接着,作者提出了二步Markov预测模型,主要解决了高阶Markov模型空间复杂度过高以及覆盖率逐步下降的问题,在此基础上又提出了混合Markov模型,给出了对应的理论支持和相应的参数求解方法,并在时间复杂度和空间复杂度上进行了分析和对比,结果表明混合Markov模型在这两个方面都优于二阶Markov模型。最后,论文对提出的预取模型在真实Web日志中进行了实验,并对实验结果进行了分析。

【Abstract】 With the remarkable and exponential growth rate of Web information and users, how to reduce the user perceived access latency and improve the quality of service of the network is coming a crucial problem, and Web prefetching and Web caching are the primary solutions. Web caching technique has been widely used in different places of Internet. But as dynamic documents and personal services increase all over the world, the performance of caching deteriorates significantly. As a result, Web perfecting, which is an efficient way of making up for Web caching, and the most effective method to break the upper bound of caching performance----is coming a hotspot in Web speedup research area.The Markov model is a simple and practical tool to prefetch Web. But some existing prediction methods based on Markov model still have some shortcoming. So it becomes a new lesson in the area of Web log mining that how to improve prediction methods. This paper analyses the current domestic and international research results of how to use Markov model to predict Web. Then we find some problems of existing prediction methods based on Markov models and we study the improving of prediction methods based on Markov model.First of all, this thesis introduces the development and the state of the Internet and WWW, gives the problems Internet faced and corresponding solutions; and describes the concept, classification of Web data mining; and Web log mining data preprocessing process. In order to overcome the drawbacks of Apriori algorithm for mining frequent itemsets, TIMV algorithm was proposed.Second of all, the interest is the selectivity attitude of objective matter of a person, and measuring user’s browse interest exactly is the base of Web base of Web schema mining. This paper analyses the present the shortage of the style of measure and expresses the browsing interest of user. For instance, the too simple measure fashion often leads to difficulty of distribution which is the user interested in or not, not considering the page information amount’s influence on the users’ browse time and so on. As a result, point out a method based on users’ browse behavior to measure the users’ browse interest.Then, a hybrid Markov predictor model was put forward based on the step-2 Markov model, which can solutes the problem of high memory demand and the low applicability. Besides that, this paper gives the sustaining theory and the way to get the parameters.Finally, experiments have been made based on the prediction model and experimental results are analyzed.

  • 【分类号】TP393.09
  • 【被引频次】2
  • 【下载频次】148
节点文献中: 

本文链接的文献网络图示:

本文的引文网络