节点文献

基于小波变换的流数据压缩算法研究

The Research on Streaming Data Compression Algorithm Based on Wavelet Transform

【作者】 李兰

【导师】 周四望; 张文奇;

【作者基本信息】 湖南大学 , 软件工程, 2010, 硕士

【摘要】 近几年来,随着网络通信技术的快速发展滋生了大量的流数据。许多实时的应用系统面对的都是在线的、持续的数据流。流数据海量无限的特性决定了我们无法用传统的存储方式将其完全保存,此外不经处理完全传输这些数据会占用大量有限的网络带宽,造成网络阻塞。因此,对流数据进行压缩处理显得尤为重要,具有现实意义。本文围绕数据流时间序列错位相似性、聚类压缩、多小波变换三个方面进行了深入研究。主要成果包括:(1)基于动态时间弯曲技术的数据流处理方法。将一段时间内采集到的流数据作为一个时间序列来进行处理。由于同一时间段内数据流变化的影响因素基本相同,导致一些数据流变化存在错位相似,具体表现为数据流形状大致相同,但在时间上有所超前或延迟。对于这种错位相似的数据流采用常用的欧几里得测度法是无法识别的,而采用动态时间弯曲技术却可以很好地判断数据流的这种相似性。本文在采用动态时间弯曲路径法得到两个时间序列对应点的基础上提出了用预测法估计两个时间序列的关系,从而确定时间序列最佳匹配点的算法。(2)基于多元时间序列相似性聚类压缩算法。首先采用动态时间弯曲距离分析数据流之间的相似关系,根据相似程度进行模糊聚类,接着选取各聚类中心作为特征流时间序列,最后保存每个聚类的数据流编号、特征数据流序列的小波系数和其它数据流序列与特征流序列的匹配点对和关系系数作为压缩数据。之后结合上一章的最佳匹配点算法给出了数据还原的算法。从仿真实验结果可以看出,该算法能有效压缩数据流,较采用欧式距离测度能更好地提高数据压缩的精度。(3)基于多小波变换的流数据压缩算法。将多属性数据流进行多小波变换后原数据流被分解为四个不同空间方向和不同分辨率的子数据矩阵,每个子矩阵又可以进一步进行多小波变换分解,流数据能量绝大部分汇聚于低频矩阵。根据这一特点对变换后的小波系数进行编码压缩从而达到压缩数据流的目的。从实验结果看,该算法压缩率高,并且能够很好地保存数据特征,还原后的数据能基本再现原数据流。

【Abstract】 Many streaming data are generated as the rapid development of network communication technologies in recent years. Lots of real-time application systems need to deal with online and continuous streaming data. Streaming data can hardly be saved completely in conventional way because of their characteristic of infinite. And transmitting streaming data takes much in the limited network bandwidth, so it is very important to compress streaming data.This paper focuses on time series data streams dislocation similarity, clustering compression and multi-wavelet transform, including:First, data stream process method based on dynamic time warping technology. The streaming data collected in a certain period is processed as a time series. Since in the same period the factors causing data streams changes are approximately the same, so there exists dislocation similarity among data streams waves. It behaves as, the, data streams waves are similar to each other in shape, but forward or backward in time. Trandition Euclidean distance measure method can not identify the similarity of dislocation data streams, but dynamic time warping technology can do well. Based on the dynamic time warping path method to calculate corresponding points of two time series, an algorithm using prediction to estimate relationship of two time series and then find out their best match point is proposed in this paper.Second, clustering compression algorithm based on similarity of multivariate time series. Firstly, analyze the similarity among data streams with dynamic time warping distance; then do fuzzy clustering according to the similarity degree and select cluster center as the characteristic time series; lastly, save the data stream numbers for each cluster, the wavelet coefficients of characteristic time series and the match point and coefficient between every other data stream series and characteristic series as compression data. The algorithm to decompress data streams based on the best match point algorithm is also given. Simulation shows, this compression algorithm is effective, and gains higher precision than Euclidean distance measure method.Third, streaming data compression algorithm based on multi-wavelet transform. A multiattribute data stream can be broken into four sub-data matrixes in different dimension and resolution by multi-wavelet transform, and every sub-data matrix can also be decomposition. The energy of data stream mostly concentrated in low-frequency matrix. This feature is used to encode and compress the wavelet coefficients in order to compress streaming data. Experiment results show, this compression algorithm has a high compression rate, and stores characteristics of the data perfectly which ensures the original data stream can be recovered approximately after decompression.

【关键词】 流数据时间序列压缩聚类小波变换多小波
【Key words】 streaming datatime seriescompressionclusteringwaveletmulti-wavelet
  • 【网络出版投稿人】 湖南大学
  • 【网络出版年期】2011年 04期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络