节点文献

基于数据流双层结构聚类挖掘的研究

Research on Two-tier Structure Clustering Mining Based on Data Stream

【作者】 楚红涛

【导师】 寒枫;

【作者基本信息】 华北电力大学(河北) , 计算机应用研究, 2008, 硕士

【摘要】 随着计算机技术的发展,越来越多的应用产生流数据,流数据不同于传统的静态数据,它是连续的、有序的、快速变化的、海量的数据。本文的主要工作是设计和实现了双层结构流数据聚类算法TWDSCluster,它包括两部分:在线层聚类和离线层聚类。为了有效地存储保留数据流中数据点的摘要信息,本文在框架中引入了微簇和金字塔时间框架。数据点的摘要信息以微簇的形式保留,并按照金字塔时间框架存储。该算法可以有效的检测数据流中的异常点。通过相关的仿真实验和其它的算法对比,显示了TWDSCluster算法的高效性和先进性。最后对本文的内容进行了总结,并对以后的工作进行了展望。

【Abstract】 With the high development of computer technology,there are more and more applications that facing the environment of stream data.Stream data is a kind of continuous,ordered,changing fast and huge amout data.It is quite a new object that is different from conventional static data stored on the disk.The main achievement in this paper is to design and realize the two-tier framework TWDSCluster which includes two parts the online cluster and the offline cluster.We introduce two concepts microcluster and pyramidal time framework.The statistical information in data points is retained as the form of microcluster,and stored in terms of the pyramidal time framework. It can also detect outliers in data stream efficiently.Experiments show that our algorithm can get higher accuracy of clustering within limited memory.Finally,we summarize the content of the paper and point out the research emphases for future work.

  • 【分类号】TP311.13
  • 【下载频次】121
节点文献中: