节点文献

基于兴趣度的关联规则挖掘算法的研究

Mining Algorithm Research for Association Rules Base on Interest Measure

【作者】 陈安龙

【导师】 陶宏才;

【作者基本信息】 西南交通大学 , 计算机应用, 2003, 硕士

【摘要】 数据挖掘是面向海量数据的知识发现技术,研究高效的挖掘算法是数据挖掘研究的重要内容之一。关联规则是数据挖掘的重要模式之一,有着极其重要应用价值。本文主要研究了如何提高布尔关联规则的挖掘算法的有效性和伸缩性。 Apriori算法是挖掘布尔关联规则的算法,而该算法在空间和时间的复杂性有着难以克服的局限性。因此,文中引入了一种不需要产生候选项的频繁模式增长算法,将数据库的事务的信息压缩到FP一树,然后通过后缀与前缀连接产生频繁模式,从而避免了多次扫描数据库,降低了时间开销。 当数据库中的项目数目较大且事务数量巨大时,频繁模式增长算法内存开销很大,可能导致内存空间不足的现象。因此,本文提出了基于极大团划分的模式增长算法,将事务项目集分解成若干子集,对每个子集分别使用频繁模式增长算法找出它们的频繁模式,从而解决了内存不足的矛盾。同时,提出了一种用邻接矩阵产生频繁2项集的方法,可以减少扫描数据库的次数。 如何从大量的关联模式中筛选出用户感兴趣且有价值的规则,是算法研究的重要内容之一。基于支持度和信任度的框架模型有一定的局限性,本文在此框架中引入了基于影响的兴趣度,用来修剪无趣的规则,从而筛选出用户真正感兴趣的规则模式。

【Abstract】 Data mining is the knowledge discovery technique oriented to a great deal of data. Researching efficient algorithm is one of the important contents in study of data mining. Association rule is one of the important models of data mining, and has the most significant application value. The core of this dissertation is how to improve the validity and scalability of mining algorithm of Boolean association rules.The Apriori algorithm is the method of finding Boolean association rules, but has the disadvantage in the complexity of space and time. Therefore, this thesis introduces a new frequent-pattern (FP) growth algorithm that does not need to produce the candidate item sets. This algorithm compresses information in database to the FP-tree, then produces frequent pattern by joining suffix with prefix, consequently avoids scanning the database many times, and lowers the time expense.When there are a great many of items and transactions in the database, frequent-pattern growth algorithm needs more additional computer memory, which may cause the lack of memory. Therefore, this paper brings forward frequent-pattern growth algorithm based on maximum clique that resolves problem of memory insufficiency by dividing item set into several subsets, then computing frequent-pattern for each subset. In this paper, a new algorithm is given to find fraquent 2-itemset by adjacency matrix with less times scanning the database.How to select the interested and valuable rules from a large number of association modes is one of the important contents in study of mining algorithm. There is limitation in model based on support and confidence measure, thus interest measure model based on effect is given in this dissertation, which is used to prune the no-interest rules in order to discover the real interest rules mode.

  • 【分类号】TP311.13
  • 【被引频次】13
  • 【下载频次】428
节点文献中: 

本文链接的文献网络图示:

本文的引文网络