节点文献

数据挖掘中关联规则的研究与应用

Research and Application of Association Rule in Data Mining

【作者】 张友志

【导师】 周熙襄; 钟本善;

【作者基本信息】 成都理工大学 , 信号与信息处理, 2004, 硕士

【摘要】 随着计算机技术的发展和Internet的普及,Web和用户对Web访问的信息的爆炸式增长与人们注意力的有限性之间的矛盾也随之加剧,Web数据挖掘是解决这一矛盾的有效手段,但由于Web数据及应用的特殊性,使得传统的技术不能直接应用在Web的信息挖掘中。Web日志数据是记录用户对Web站点访问信息的数据,保存有大量的路径信息,对这类信息的分析有利于设计人员掌握用户访问Web的行为特征,并可以用来对网站的结构进行优化和页面的重组。 传统的关联规则挖掘技术是从包含一组事务记录的数据库中发现一些事务项目间关系的信息。本文的工作,是致力于将关联规则的概念引入到Web挖掘系统中,将用户的访问路径以关联规则的形式表现出来,其目的在于从用户访问超文本系统的行为中发现用户的访问模式。文中对数据挖掘中的关联规则进行了系统的探讨,在综述Web数据挖掘的分类、研究内容和目前的研究现状的基础上,给出了从原始日志数据如何初步分析出用户路径的启发性规则和形式化描述。在此基础上,给出了两种方法来发现用户的访问关联规则。一种是采用最大向前路径(MF)方法,最后的步骤类似于传统数据挖掘中的Apriori算法。另一种方法是将超文本系统看成是一种有向加权图,经过对可信度和支持度的重新定义,使之适合于用来表示用户的访问路径,并引出复合关联规则挖掘算法。

【Abstract】 With the development of computer technology and the popularization of Internet, Web and Web usage information is becoming the largest information Warehouse with the exploding rise of www. So the conflict between the limited human attention and the unlimited information is notable. Web data mining is a useful method to solve such problem, but the www data and application on Web have their own characters so that the traditional technology cannot apply to the information mining on www directly. The Web log contains the visit information of all users, especially the path information. The analysis of this kind of information is useful for the Website designer to know the users Web usage pattern. The designer can use the result of analysis to optimize the structure of Website and reorganize the structure of Webpage.Traditional association rule techniques aim to mine some relations between transaction items from databases consisting of a set of transaction records. In this work, we try to introduce the notion of association rule into the Web mining system and represent the user traversal path in the form of association rule. The aim is to discover the visit patterns from the Web log. In the paper we have a systematic research into the association rules of data mining. Web mining categories, study content and aim are introduced at first. We propose the methods to get the user path from the Original log data. Then we give two methods to mine the user s access association rule. One is the Maximal Forward References(MF)method, and it is like the traditional data mining methods, Apriori. The other method is regard the hypertext system as a weighted directed graph. After the redefinition of the confidence and support, we propose the composite association rule mining method.

  • 【分类号】TP311.13
  • 【被引频次】3
  • 【下载频次】331
节点文献中: 

本文链接的文献网络图示:

本文的引文网络