节点文献

基于关联规则的数据挖掘算法研究

Study of Data Mining Algorithm Based on Association Rules

【作者】 吴仁堂

【导师】 周根宝;

【作者基本信息】 内蒙古农业大学 , 计算机应用技术, 2010, 硕士

【摘要】 关联规则是数据挖掘技术的一个最活跃的研究方向之一,其反映出项目集之间有意义的关联关系。关联规则可以广泛地应用于各个领域,既可以检验行业内长期形成的知识模式,也能够发现隐藏的新规律。有效地发现、理解和运用关联规则是完成数据挖掘任务的一个重要手段。关联规则挖掘需要在挖掘效率和精确性方面进行改进,也需要新的更有效的算法。本文对关联规则挖掘相关的概念和关联规则典型算法进行了详细的分析和总结,然后在介绍关联则挖掘基本算法-Apriori算法的基础上,对现有的经典算法进行了研究分析并指出了它们使用的传统搜索方法和频度计算上的不足。传统算法存在的另一个重要问题是:生成的关联规则之间存在着大量的冗余规则,这使得用户分析和利用这些规则变得十分困难,如何修剪冗余规则以便用户分析成了一个重要课题。减少冗余规则的方法很多,目前对冗余规则的修剪技术主要在正关联规则领域,但负关联冗余规则的修剪同等重要,本文在介绍正关联规则修剪的同时也对负关联规则挖掘技术进行了深入的研究讨论。并在现有算法的基础上提出了新的冗余规则裁剪算法,该算法运用概率论的相关性定义进一步对生成的关联规则进行裁剪。接着介绍了基于模式矩阵匹配的关联规则算法-APM算法,并对算法性能进行了分析。APM算法扫描一遍数据库后就不再使用数据库,并且用矩阵的编码方式用来求一个待生成的k-项集是不是频繁项集,大大提高了挖掘关联规则的效率,对数据挖掘来说有一定的实用价值。

【Abstract】 Association rules is one of the major issues in data mining technology,it reflects the significative association between sets. Association rules can be widely applied in various fields, it can test the knowledge pattern of the industry’s long-established and can find some new rules. Effectively find, understand and use data mining association rules is an important means to complete the task.Mining association rules need to improve the efficiency and accuracy, also needs new and more efficient algorithms. In this paper, the concept of association rule mining and the typical algorithm related to association rules is analyzed and summarized.And then introduce the basic algorithm for mining association rule -Apriori algorithm, based on existing classical algorithm analysis and pointed out their lack of use traditional search and in frequency calculation.However, there is the traditional method is another important issue: the association rules algorithm generated a great deal of redundancy rules, which makes users to analyze and use these rules very difficult. This requires us to prune redundant rules, and how to enable users to facilitate analysis has become an important issue. There many ways to reduce the redundant rules, the current pruning techniques redundant rules are mainly in mining positive association rules, but the negative association rules of pruning redundant rules is also important, this paper study and discussion positive and negative association rules techniques in-depth. A new non-redundant rules algorithm is presented based on former algorithm, this algorithm using the correlation knowledge of probability for further pruning redundant rules.Then introduced an association rule mining algorithm based on pattern matrix matching -APM association rules algorithm, and this algorithm performance is analyzed. APM algorithm scans the database once and use the encoding matrix to determine a k-item set is frequent item set or not. The experimental results demonstrate that the algorithm is correct and effective.

【关键词】 数据挖掘关联规则修剪APM算法
【Key words】 Data miningAssociation rulesPruneAPM
节点文献中: 

本文链接的文献网络图示:

本文的引文网络