节点文献

时间序列数据挖掘相似性度量和周期模式挖掘研究

Similarity Measure and Periodic Pattern Mining of Time Series Data Mining

【作者】 董晓莉

【导师】 王正欧;

【作者基本信息】 天津大学 , 管理科学与工程, 2007, 博士

【摘要】 随着信息技术的飞速发展,数据挖掘受到越来越多的关注。时序数据在现实生活中广泛存在,如金融市场、工业过程、科学试验、医疗、气象、水文、生物信息等,而且存储规模呈现爆炸式增长。因此对时间序列数据挖掘问题进行深入研究是非常必要和富有挑战性的。目前的时间序列数据挖掘技术尚处于起步阶段,挖掘算法有待扩充和完善。本文在综述了时间序列数据挖掘研究发展概况后,对目前的主要方法进行了总结评述,在重新描述、相似性比较和周期模式挖掘几个方面进行了深入研究。最后在总结全文的基础上,指出了本文有待深入研究的若干问题。本文的创新性工作主要包括以下内容:1)提出了基于形态的时间序列相似性度量方法。本方法在时间序列分段线性化的基础上,采用了基于斜率相对变化的符号化重新描述方法,可以有效描述序列形态的动态变化趋势;同时提出了一个与之对应的距离度量公式,克服了点距离度量中存在的对各种扰动敏感的缺陷。实验证明,本方法还具有时间多分辨率特征,可以比较在不同时间分辨率下的时间序列的相似程度。2)提出了局部分段动态时间扭曲算法。经典动态时间扭曲算法(DTW)在时间序列相似性度量中具有重要作用,但由于计算复杂度较高,很难应用于实际数据库中。本文提出了一个新的算法——局部分段动态时间扭曲算法。在对时间序列进行分段线性化的基础上,将每一个段视为一个整体,应用经典的动态时间扭曲算法,通过设置补偿系数,保证了算法的精度。实验表明,本算法能够在计算精度几乎没有损失的情况下,有效地提高经典DTW算法的效率。3)提出了一种高效的时间序列异步周期1-模式挖掘算法。本算法设计了一种基于2进制编码的映射算法,并提出了改进的点乘算法,可以通过一次计算发现一个事件在序列中出现的所有位置;并且,本算法用并行计算替代了原算法中的串行计算方法,显著减少了数据的运算和存储次数。实验证明,本算法在完全不降低原算法准确性的基础上,显著提高了算法效率。4)首次提出了时间序列局部周期频繁模式的概念及其挖掘算法。不同于现有的所有周期挖掘算法,本算法不但能够挖掘出贯穿时间序列全局的频繁发生的周期模式,而且能够发现只在某个局部频繁发生的周期模式。本算法首先将时间序列划分为局部集合,然后基于数据自行找出序列中隐藏的潜在周期,生成局部周期频繁1-模式,最后在每一个有交叉的局部上,应用最大命中子模式树算法合成复杂模式输出。实验证明,本算法可以有效地发现时间序列中的局部周期频繁模式,其中的剪切算法和周期阈值公式能够有效提高算法效率。

【Abstract】 Data Mining has attracted much attention with the development of information technology. Time series data are a kind of important data existing in a lot of fields, such as financial market, industrial process, science experiments, etc, and the quantity of time series data has explosively increas. So it is necessary to study on the subject of the time series data mining. Nowadays, time series mining technology is still in its infancy and the algorithms are expected to be extended and to be complemented.After major issues in time series data mining surveyed, some algorithms are summarized and appraised in this dissertation. Then the problems on time series’ representation, similarity measure and periodic pattern mining are deeply researched. At last, on the basis of the summery of the whole dissertation, we propose the several problems needed to be further researched in the future. The main innovative achievements are described as follows.1) A novel method of shape-based time series similarity measure is proposed. Based on the PLR algorithm, the time series is represented by using the relative change of the slope of the lines, which effectively reflects the degree of the dynamic change of the tendency of the curves. Moreover a corresponding distance measure formula is proposed, which can increase the robust to disturbances compared with the point-to-point Euclidean distance measure. The experimental results show that the approach can effectively measure the similarity of time series under various analyzing frequency.2) A novel method of local segmented dynamic time warping distance measure (LSDTW) for time series data mining is proposed. DTW is of great importance in time series similarity measure, but its expensive computation limits its application in massive datasets. The LSDTW algorithm is proposed to solve this problem. Based on the PLR algorithm, regarding every segment as a whole, classical DTW algorithm is used, and a compensate coefficient is proposed to ensure the accuracy of the method. The experimental results show that the approach gives better computational efficiency compared with the classical DTW without loss of accuracy.3) A highly efficient of asynchronous periodic 1-patterns mining method is proposed. A binary representation based mapping scheme is designed, and a modified dot product algorithm is proposed to find all the positions of an event in the time series; and a parallel calculation method is proposed to replace the series calculation method, which notably decrease the times of the calculation and access process. The experimental results show that the approach significantly increases the efficiency without loss of the accuracy.4) A novel concept and its mining method of local periodic frequent patterns in time series are proposed. Different from all exiting algorithms, this method not only can mine the frequent periodic patterns arised through the whole time series, but also can find the periodic patterns happened just in the locality of the sequence. This method first divides the time series into a set of fragments, then finds the hidden potential periodicities based on the data, and produces the local periodic frequent 1-pattern, at last using the max-subpattern tree method merges and outputs complex local periodic frequent patterns. The experimental results show that the approach can effectivly mine the local periodic frequent patterns in time series, and the cut algorithm and the periodicity threshold formula proposed in this paper can significantly increase the efficiency.

  • 【网络出版投稿人】 天津大学
  • 【网络出版年期】2009年 04期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络