节点文献

多时间序列数据流聚类算法研究

Research on Mulitple Time Series Streams Clustering Algorithms

【作者】 金燕

【导师】 刘青宝;

【作者基本信息】 国防科学技术大学 , 管理科学与工程, 2007, 硕士

【摘要】 随着数据流应用的需求不断增加,数据流上的挖掘算法越来越受到国内外研究学者的重视。本文针对多时间序列数据流环境,提出计算数据流对间多层次相关度的方法,并且在多层次相关度计算的基础上提出针对多数据流环境的挖掘算法。多层次相关度计算的优点:(1)多数据流的多层次相关度的计算方法对传统时间序列统计相关度的方法进行了改进,适应数据流多变性和不可重复性的特点;(2)运用离散傅立叶变换来对数据流数据进行压缩和处理,降低了系统内存存储需求,加快了计算机处理的时间;(3)运用多层次时间窗口模型对数据流进行多层次相关度的统计,不仅知道当前时刻的数据流对间相关性,而且可以通过多粒度聚集树(DSMGA-Tree)结构查询数据流对间历史时刻的相关性,并返回多层次的相关度信息,在一定的误差范围内满足了详细查询请求。在多层次相关度计算的基础上,本文提出针对多数据流环境的动态挖掘算法。(1)算法将传统数据库领域的数据挖掘技术运用到多数据流环境,动态地实现了多数据流聚类;(2)提出基于相关度的多数据流动态聚类算法(CBDMSClustering),运用DBSCAN基于密度的聚类思想,可以聚集出任意形状的类,同时,时间复杂度并没有增加;(3)在CBDMSClustering算法的基础上进一步改进,提出基于相对相关度的多数据流动态聚类算法(RCBDMSClustering),除了可以聚集出任意形状的结果,还可以区分出不同相关度的类,将受低相关类掩盖的高相关类识别出来。实验证明了多数据流动态聚类算法的改进可以得出更好的聚类结果。

【Abstract】 Along with the requirements of data streams’ applications increasing, clustering algorithms on data streams attract more and more researchers’ interests. The paper aims at multiple time series streams, brings forward the method to compute multilayer correlation, and brings forward mining algorithms for multi-streams on the base of multilayer correlation.Multi-streams’ multilayer correlation computing method have three advantages: (1)Multi-streams’ multilayer correlation computing method improves the classic time series correlation methods , adapting to streams variety and the character of can not repeat;(2)We use DFT to compress and deal with data streams’ data, it’s reducing the requirements of system’s memory storage, quickening computer’s deal time;(3)We use multilayer time windows model to statistic multilayer correlations. We not only know correlations between current data streams, but also can query history correlations between multi-streams by DSMGA-Tree, returning multilayer correlations, meeting particular query request under a certain error range.On the base of multilayer correlation, we bring forward dynamic mining algorithms for multi-streams. (1)Algorithms use classic mining techniques in databases to multi-streams, achieve multi-streams clustering dynamical; (2) We propose CBDMSClustering on the base of correlation , use DBSCAN’s idea on the base of density, can gain arbitrary clusters, meanwhile without add time complexity;(3)On the base of CBDMSClustering, we improve further, propose RCBDMS Clustering. It can distinguish different correlation density clusters, other than gain arbitrary clusters. It can identify high correlation clusters which covered by low correlation clusters. Experiments prove that the multi-streams dynamic clustering algorithms advances result obviously.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络