节点文献

数据库中关联规则及效用模式挖掘算法的研究

The Study of Association Rule and Utility Pattern Mining Algorithm

【作者】 熊学栋

【导师】 肖建华;

【作者基本信息】 湘潭大学 , 计算机应用技术, 2007, 硕士

【摘要】 近年来随着数字化在各机关企业中越来越普及,数据库在各个企业中的角色也就越来越重要。数据库所累积大量的数据中往往隐藏了许多有用的重要信息,如何能够有效率且正确地发掘出这些信息就变成为一个重要的课题,因此数据挖掘技术随即应运而生。目前数据挖掘中应用最广的技术就是关联规则的挖掘,许多的相关技术及研究已经被提出。关联规则挖掘模型以平等的方式对待每个项目(item),只考虑项目是否在事务记录中出现。但是在实际的情况中,项目之间的是有明显区别的,我们可以将这种区别定量化,其中一种方法就是以效用来衡量项目之间的区别。本文在研究提出关联规则新算法的同时,对另一类问题,效用模式的挖掘也作了细致的研究。效用模式挖掘是一个全新的挖掘技术分支.效用模式发现问题是和关联规则,序列分析较为相似的一类问题,它们有共同的数据背景------从购物篮数据延伸开来的客户记录数据。和另外两者的挖掘类似,效用挖掘也是从这些数据中寻找潜在有用的,非平凡的支持决策的新知识。只是更加侧重满足最小效用值,可以看成是一种带有约束的项集挖掘。本文延续了对关联规则的研究,给出了一种基于划分和分解的算法,该算法基于划分的思想,只需扫描数据库一次,较大的减少了候选项集的数量,也缩小了检验候选项集时考虑的范围。实验表明该算法在效率上有较大的改进。针对效用挖掘的情况,本文在总结前人研究的基础上,将问题转化为一个最优化问题,提出一种基于二分划分树的启发式算法,该算法能有效的在数据中寻找效用模式。相对于基于剪枝的效用模式发现算法,该算法性能上有较大的突破。本研究的主要内容为有效的关联规则算法和效用挖掘新算法,通过在实验中对比算法的性能,验证了研究成果的先进性。

【Abstract】 With the coming of the age of information, more and more companies and government agencies are being digitally equipped. And the database technology is playing more important a role than ever before. Huge amount of information remains undiscovered in these accumulated databases .it becomes a crucial challenge to efficiently and correctly extract the useful information hidden in these databases .data mining technology address this problem. As far as it goes , the most popular technology in data mining is association rules mining. Many researches have been contributed in this area.The mostly studied association rule mining model care about whether an item is included in a transaction or not. And thus treat all items equally. While in the real world case, there are discriminations between items. We can quantify these differences between items , one option is that we can use utility as a measure to signify the usefulness of the respective items.The thesis works on the research of association rule mining, along with the research on utility pattern mining, which is a emergent new topic in the data mining community .utility pattern mining is somewhat like the association rule and the sequence analysis ,for they share the same form of the targeted database .and all contrived to obtain the finding of potentially useful, non-trivial ,decision-support knowledge.The thesis proposes a association mining algorithm based on partition and decomposition. The algorithm was grounded on the idea of partitioning the whole database which is a way to save for RAM storage .it scans the database once and shrinks the amount of candidate item-sets. The experiment concludes that it has an edge in the efficiency comparison.Another contribution of the thesis is the study on the utility pattern mining. the author proposed an heuristic algorithm based the dynamic binary partition tree. The algorithm doing so by further assuming the problem in an optimization framework .the experiment also shows that it is more robust than former ones.This dissertation paper mainly deal with the research of association rule mining and the utility item-sets mining problem. the experiment dedicated to the verification of the proposed algorithms show that the research is novel and constructive .

  • 【网络出版投稿人】 湘潭大学
  • 【网络出版年期】2008年 05期
  • 【分类号】TP311.13
  • 【被引频次】2
  • 【下载频次】165
节点文献中: 

本文链接的文献网络图示:

本文的引文网络