节点文献

关联规则挖掘的相关问题研究

The Research on the Related Problems of Association Rule Mining

【作者】 张铁军

【导师】 杨君锐;

【作者基本信息】 西安科技大学 , 计算机应用技术, 2009, 硕士

【摘要】 关联规则挖掘是数据挖掘领域中一个重要研究方向,而频繁模式挖掘又是关联规则、时序模式挖掘等应用中的关键技术和步骤。然而,由于挖掘频繁模式内在的计算复杂性,为了提高挖掘效率,业界相继提出了频繁闭合模式挖掘和最大频繁模式挖掘问题。在规模上,频繁闭合模式和最大频繁模式均小于频繁模式。同时频繁闭合模式集可以唯一地确定频繁模式完全集以及它们的准确支持度,而最大频繁模式隐含了所有的频繁模式,并且在某些数据挖掘应用中仅需挖掘出最大频繁模式;另外,在实际挖掘应用中,由于事务数据库可能发生变化,而且用户还会调整最小支持度以满足新的需要,因此如何对挖掘结果进行更新是一个值得研究的问题;再有,针对关联规则新的度量标准—兴趣度的度量方法也是业界关心的一个热点问题。因此,对这些问题进行研究具有重要意义。本文主要研究了关联规则挖掘中的相关问题,主要包括以下内容:首先,提出了用于挖掘频繁闭合模式的FCI-Miner算法,以及挖掘最大频繁模式的BFP-Miner算法。两个算法均利用改进的FP-Tree来压缩存储数据库中的事务,并充分利用该树的特点,使得在挖掘频繁闭合模式和最大频繁模式的过程中不需产生条件FP-Tree和候选模式,从而减少了挖掘过程中使用的存储空间和计算时间,实验结果表明,算法具有较好的性能。其次,提出了用于解决最小支持度和数据库都发生变化的综合更新挖掘最大频繁模式问题的IUMFPA算法。该算法利用完全FP-Tree并通过调整最大频繁模式进行快速最大频繁模式更新挖掘,实验测试和分析表明,该算法有较好的时空效率。最后,针对当前基于支持度—置信度框架挖掘关联规则时所反映的不足,提出了一种能反映项目集之间相关性和稀有性的度量标准—兴趣度,通过其可用来发现数据库中支持度低,而置信度强和紧密性高的规则。通过实例分析说明了该度量标准在一些应用中的有效性和实用性。

【Abstract】 The association rule mining is a very important problem in data mining. The issue of mining frequent patterns plays a crucial role in association rule mining、sequential pattern mining, etc. Because of the time-consuming in mining frequent patterns, mining frequent closed patterns and mining maximal frequent patterns have been proposed to improve the mining efficiency. The set of frequent closed patterns or maximal frequent patterns is orders of magnitude smaller than the set of frequent patterns. The set of frequent closed patterns still contains enough information of the frequent patterns and its accurate support. The set of maximal frequent patterns contains all the set of the frequent patterns and there are applications where the set of maximal frequent patterns is adequate. In some applications, users may adjust the minimum support while database changed, and have to update the former mining results, so it is worth of studying in this case. Mining the interesting rules is another interesting issue. In all, it is very significative to do some researchs on those issues. In this paper, we have done some researches on the related problems of association rule mining. It is stated as follows:Firstly, two efficient algorithms FCI-Miner for mining frequent closed patterns and BFP-Miner for mining maximal frequent patterns are presented in this paper. The two algorithms all based on the improved FP-Tree (Frequent Pattern Tree) in order to compress and store the recorders of transaction database, and used depth-first search strategy without generating conditional FP-Trees and candidate patterns. The experimental evaluation on a number of real and synthetic databases shows that our algorithms outperform previous method in most cases.Secondly, a new integrated updating algorithm for mining maximal frequent patterns IUMFPA is proposed, which is aimed at handling the user adjusting the minimum support while database changes in order to find more useful maximal frequent patterns. It makes use of improved full FP-Tree structure and also utilizes the former FP-Tree and the mined results sufficiently. The experimental results indicate that IUMFPA performs efficiently.Finally, we propose a brief measure of rule interestingness to overcome the insufficient based on the support-confidence framework. It can determine the correlation and rarity of association rules, and especially be used to discover rules with strong correlation and high confidence, but low support. In the end, we take an example to demonstrate its effectiveness and practicality.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络