

Research on Web Usage Mining Based on Association Principle

【作者】 符翔

【导师】 金瓯;

【作者基本信息】 中南大学 , 计算机应用技术, 2010, 硕士

【摘要】 随着因特网的普及和迅速发展,电子商务的快速发展也得到研究者们更多的关注,期望能够在这种崭新的商务形式下,利用它的诸多优点,取得更多的经济效益。Web服务器以日志的方式记录下人们的诸多浏览动作,这就可以以此为根据改善网站的拓扑结构,从而改进网站的性能,也允许让我们来更深的探讨用户浏览站点的特有方式,为客户提供更多的人性化服务。由于商业上有如此强烈的需求,由此产生了对Web日志进行挖掘。因此,开展本研究方向有很大的实用意义和价值。本论文针对Web使用挖掘进行了较深入的研究。首先对Web挖掘、Web日志挖掘的基本理论知识和分类进行了总体研究。具体说明了数据来源及日志记录的内容与格式。接着,具体研究了日志挖掘的预先处理日志的过程,包含清理数据,辨别用户,辨别会话,过滤框架,补充路径,辨别事务。然后,详尽介绍了关联原理的一些基本概念,讲述了基于关联原理的经典算法-Apriori算法。重点是提出了在算法Apriori的基础上把事务集放进事务矩阵的思想,对原算法进行了一定的改进。改进算法首先去掉首页,这样会明显的减少矩阵的维数,然后不再需要搜索候选项集,提高了计算的效率。理论分析和实验证明了改进的算法是有效且可行的。接着利用频繁项求出关联规则,这样通过Web日志得到了有联系的规则。最后根据Web日志挖掘的流程设计并实现了一个基本的挖掘系统进行实验,此系统设计为三大部分:数据预先处理模块,频繁模式挖掘模块,关联规则挖掘模块。

【Abstract】 With the rapid development and popularization of Internet, development of Electronic Commerce has aroused more concerns among researchers. They expect to get more economic benefits in this new business model by using its advantages. Web server records user’s browsing behaviors in the form of web log. This act will allow us to study particular rules of browsing website so as to provide more personalized work for users. Also, based on the especial principle, status and the topology structure of web will be improved. As the requirements of business, the technology of Web Log Mining emerges out. Carrying out this examination term has great value.This paper makes an intensive study to the Web Log Mining. Firstly, it has a general study to the basic knowledge and classification of Web Mining and Web Log Mining, and introduces the content and format of the web log in detail. Secondly, the paper discusses data preprocessing process of web log mining totally including data cleaning, user distinguish, session distinguish, frame filter, path supplement, transaction recognition.Then, this paper obviously presents the concepts of association principle. Primarily the paper tells a classic Apriori Algorithm of association principle. On the basis of study Apriori Algorithm applying association principle of transaction matrix. The new algorithm remove the first page and this will significantly reduce the dimensions of the matrix, then the algorithm no longer need to search for candidate itemsets and this will improve the eddiciency of computing. this improvement of the algorithm is effective and feasible. This paper introduces a method on how to get association rules through frequent item sets. Finally, the thesis designs and implements a Web mining system for data mining experiment according to Web log mining process. The system is divided into three parts:data pre-processing module, mining frequent patterns modules and association rule mining module.

  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2011年 03期

