节点文献

关联规则基本技术研究

Research on the Basic Technology of Association Rules

【作者】 郭运凯

【导师】 杨君锐;

【作者基本信息】 西安科技大学 , 计算机应用技术, 2009, 硕士

【摘要】 数据挖掘是指从大型数据库中发现潜在的、新颖的、有价值的、可用的及能被用户理解的模式和信息的过程。关联规则挖掘是数据挖掘的一个重要研究领域,主要是发现数据库中属性之间的关联关系。本文在广泛查阅国内外文献的基础上,针对关联规则算法的若干问题进行了深入地分析研究,论文的主要研究内容和成果如下:首先,提出了基于排序FP-Tree(Sorted FP-Tree,简称SFP-Tree)的最大频繁项目集挖掘算法SFP-Miner。在SFP-Miner算法中,通过两次扫描数据库将其中每个事务所包含的频繁项目压缩存储在SFP-Tree中。在挖掘过程中,充分利用SFP-Tree的特点,并采用合并子树和预剪枝策略在SFP-Tree上进行深度优先挖掘,而不需要扫描数据库,减少了算法在挖掘过程中使用的存储空间和计算时间。实验结果表明,该算法有较好的性能。其次,提出了基于完全合并SFP-Tree的最大频繁项目集更新挖掘算法UAMFI。该算法基于完全合并SFP-Tree,直接在树上进行深度优先搜索,能够快速地进行最大频繁项目集的更新挖掘。实验测试和结果分析,该算法可以高效的更新最大频繁项目集。最后,针对多值属性关联规则挖掘问题,提出了基于高维聚类的多值属性关联规则挖掘算法DBSMiner。该算法借鉴ARCS思想,先将高维数据集的各维进行划分,然后将密度单元进行排序,并提出一种基于网格的高维聚类算法对划分后的数据进行聚类挖掘。理论分析和试验结果表明,DBSMiner算法具有较好的执行效率和精确度,能有效的进行多值属性关联规则的挖掘。

【Abstract】 Data mining means a process of finding nontrivial, extraction of implicit, pervious unknown and potential useful information from data in database. Association rule mining as an important field of data mining discovers interesting relationships among attributes in those data.By studying the literature domestic and abroad, we research some basic problems of association rules mining algorithms. The main contexts are showed as follows:Firstly, a maximal frequent itemset mining algorithm SFP-Miner, which based on Sorted FP-Tree was proposed. The SFP-Miner scanned Database twice and compress stored the frequent itemset in SFP-Tree. By using depth-first strategy, the algorithm pruned the searching space by pre-prune and mergence strategy and discovered all the maximal frequent itemset efficiently and didn’t need to scan the Database. The experimental result indicated that SFP-Miner is an efficient algorithm.Secondly, we presented a new updating algorithm, UAMFI, for mining maximal frequent itemsets from transaction database when minimum support was changed by customer. The algorithm adopted a new data structure FMSFP-Tree (Full Merged SFP-Tree) which stored all the frequent itemsets in any given minimum support and it directly mined and updated the maximal frequent itemsets in FMSFP-Tree. It can efficiently mine maximal frequent itemsets with changed minimum support. From the experimental result, we can conclude that the algorithm is highly efficient to the updating mining problems.Finally, we presented a new algorithm, DBSMiner (Density Based Sub-space Miner), for mining quantitative attributes association rule. This algorithm, which referenced the ARCS (Association Rule Clustering System), used a grid structure to quantize the object space into a finite number of cells; it sorted all the dense grids by descending order and used a grid based cluster algorithm to cluster the data with all attributes. At last, it clustered the association rules. Theoretical analysis and experimental results show that, DBSMiner algorithm has good performance and accuracy. It can effectively mine association rule of quantitative attributes.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络