节点文献

基于分布式的频繁闭合模式挖掘算法研究

The Research on the Distributed Algorithm of Mining Frequent Closed Patterns

【作者】 张敏

【导师】 杨君锐;

【作者基本信息】 西安科技大学 , 计算机应用技术, 2011, 硕士

【摘要】 关联规则挖掘是数据挖掘研究中的热点问题之一,其目的是发现数据库中数据项之间存在的潜在联系。关联规则挖掘的重点任务是频繁模式挖掘。然而,由于频繁模式挖掘的复杂性,业界提出了频繁闭合模式挖掘问题。频繁闭合模式可以唯一地确定所有频繁模式完全集以及它们的准确支持度,且其规模远远小于频繁模式。在单处理机上的频繁闭合模式挖掘算法研究方面,人们已经取得了许多成果。但随着分布式环境的日益普遍,使得传统串行算法的挖掘技术已无法解决分布式下的挖掘问题,因此,研究高性能的分布式频繁闭合模式挖掘算法显得尤为重要。本文在对典型关联规则挖掘算法进行较深入研究的基础上,将分布式思想引入关联规则挖掘中,提出了两种分布式频繁闭合模式挖掘算法,主要内容有以下两部分:第一部分提出了一种基于分布式的频繁闭合模式挖掘算法-PFCI_Miner。算法采用任务分布的主从方式,其中主处理器通过发送文中提出的前缀路径表(PrePthx)将挖掘任务合理划分,而从处理器借助提出的存储树(Trac-tree)挖掘局部频繁闭合模式,最后由主处理器挖掘出全局频繁闭合模式集。另外,采用星形的拓扑结构,使数据通信只存在于主处理器与从处理器之间,而各从处理器之间无数据通信且不需要同步。实验结果表明,PFCI_Miner算法具有较好的效率。第二部分针对数据流及分布式算法的特点,提出了一种数据流下的分布式频繁闭合模式挖掘算法DSFC_Miner。该算法采用分段思想,挖掘每个数据流分段的临界频繁闭合模式,并创建相应的局部FCI_DS树保存临界频繁闭合模式。最后通过合并局部FCI_DS树,在允许误差范围内挖掘得到当前数据流中的频繁闭合模式集。实验结果表明该算法是可行的。

【Abstract】 Mining association rules is one of the most important problems in data mining, whichcould describe the potential relationships between items in the magnanimous data. The miningof association rules focuses on the frequent patterns. Because of the complexity of frequentpatterns, mining frequent closed patterns have been proposed to improve the miningefficiency. The set of frequent closed patterns is far smaller than the set of frequent patternson scale. The set of frequent closed patterns still contain enough information of the frequentpatterns and its accurate support. People have made many achievements in the research offrequent closed patterns on a single processor. But as the distributed environment has becomemore common and the traditional serial algorithms can not solve the mining problems underdistributed one, it is very important to desigh the high-performance distributed miningalgorithms.This thesis analyzes the performance of typical algorithms of association rules, and theirvirtues and disadvantages. For the shortages of the traditional algorithms, two algorithmsbased on distributed for mining frequent closed patterns are presented. The major workincludes the following two parts::In the first part, in view of the characteristic of mining association rules and distributedenvironment, one efficient algorithm(PFCI_Miner) based on distributed for mining frequentclosed patterns is presented. The algorithm uses the Master-Slave structure to implement taskdistribution. The Master-processor assigns the task efficiently by sending Prefix PathTable(PrePthx) which is presented in the paper, and the Slave-processors mine local frequentclosed patterns with the help of the proposed store tree(Trac-tree). Finally the main processorfinds out the global frequent closed patterns. The algorithm uses star-like topology in order tomake all data communications only between the Master-processor and the Slave-processors. There is no communication and synchronization among all Slave-processors. Theexperimental results show the efficiency of the PFCI_Miner.In the second part, according to the features of data streams, a distributed algorithmDSFC_Miner for mining frequent closed patterns from data streams is proposed. In addition,the method, in which data streams are partitioned, is adopted in the algorithm. The algorithmgets critical frequent closed patterns from each data stream section, and creates correspondinglocal FCI_DS tree to store the critical frequent closed patterns. Though introducing error, thepresent global frequent closed patterns can effectively be mined. The experimental resultssuggest that DSFC_Miner algorithm is fast and effective

节点文献中: 

本文链接的文献网络图示:

本文的引文网络