节点文献
时间序列挖掘和相似性查找技术的研究
Research on Mining and Similar Searching in Time Series Database
【作者】 唐亮;
【导师】 张文龙;
【作者基本信息】 上海师范大学 , 计算机应用, 2004, 硕士
【摘要】 时间序列(Time Series)是一种重要的数据对象,在现实生活中的许多领域中都广泛存在,如股票价格,商品销售数据,气象数据等等。随着时间推移,这类数据的存储规模呈现爆炸式地增长。因此,对这些海量的时序数据如何进行有效的知识发现,挖掘其内在的各种变化模式;对于用户给定具有各种抽象含义的变化模式,如何在海量时间序列库中进行相似性的检索等应用分析,是一个挑战性的、具有重要意义的理论和实际应用课题,对于我们正确认识事物变化,科学进行决策,识别各种异常行为等具有重要的指导意义。 本文在分析时间序列特点和实际应用需求的基础上,针对时间序列的挖掘与相似性查找一些关键技术进行了研究,具体包括特征模式挖掘、多序列关联模式挖掘、相似性模式查找等方面,所做的工作和取得的创新成果体现在以下三个方面: 1)时间序列特征模式挖掘研究 首次提出了一种基于互关联后继树模型的时序特征模式挖掘方法。不同于传统处理模式,该方法在序列分段上,采用了一种新颖的、基于重要点的时间序列线段化算法;再符号化过程中,采用基于相对斜率的局部符号化方法。既减少计算复杂度,又避免了噪声的影响。在挖掘算法实现上,根据序列特征模式的有序性和重复性,提出了一种无须生成大量的候选模式集的互关联后继树挖掘算法,极大地提高了挖掘效率。实验结果表明,挖掘结果不仅是一种图形化的描述,而且还具有明确的实际含义,大大有利于在实际中的应用。 2)多时间序列间关联模式挖掘研究 针对更有分析价值的多序列关联模式,进一步提出一种新颖的关联模式挖掘方法。该方法利用Allen区间逻辑关系来描述时间序列模式的关联关系,避免了传统方法在关联关系描述的上非同步性;然后通过时间观测窗口,来构造出一种包含并行模式和串行模式的特殊形式模式序列;最后,在此基础上构造一种广义的互关联后继树模型,然后用前面挖掘思路实现关联模式的挖掘。实验结果显示,该新方法比传统的Apriori算法具有更好的挖掘效率和挖掘效果。 3)时间序列相似性查找研究 分析比较了根据时间序列与全文序列的异同,采用了全文索引技术,首次提出了一种基于互关联后继树的时间序列相似性查找方法。该方法提出通过基于摘要重要点分段技术的分段动态挖掘距离作为相似性度量,既保证了度量的鲁棒性,又减少计算复杂度;利用各个分段的抽取六个主要特征,将时间序列转化成一种特定的符号序列,在此基础上利用海量全文索引结构实现了相似性的索引查找。在理论上证明了该方法不仅保证索引查找的结果不会出现任何错误的丢失,而且在实验结果上也显示该方法比传统的方法具有明显的优势。
【Abstract】 Time series is a kind of important data existing in a lot of fields, such as stock, weather, etc. With time moving, this data of time series will explode increasing. So it is important and challenging subject to research how discovery valuable knowledge in large-scale time series database, and how to search based similarity while user give a graphic query pattern. These researches will help us to discover changing or developing principle of things, support to decision-making, etc.The thesis addresses several key technical problems of pattern mining and its search based similarity in time series, which covers feature patterns and relationship patterns mining, pattern search based similarity in time series and stream time series and issues concerning application system implementation oriented to analysis. Major contributions of this thesis include:1. Research of mining feature patterns in time seriesA novel method is proposed to discovery frequent pattern from time series. Different to exiting methods, it first segments time series based on a series of perceptually important points, and then time series are converted into meaningful symbols sequences in terms of domain knowledge and the relative scope of each linear segment. After that, we designed a new data model, called Inter-Related Successive Trees IRST, to find frequent patterns from multiple time series without generation lots of candidate patterns. Experiment illustrates that the method is simpler and more flexible, efficient and useful, compared with the previous methods.2. Research of Mining Relationship Patterns in Multiple Time SeriesAn algorithm for discovery frequent patterns in multiple time series will be proposed. In this algorithm, firstly the states relationship between in time series is represented to Allen temporal logic, then use a sliding windows to examine the order or occur relationship of states and obtain a particularly sequence. On the basis of the sequence, we developed a called GIRST model to achieve finding the frequent relationship patterns in multiple time series. Experiments shows, compared with the previous methods, the method is more simple, efficient and more applied value.3. Research of similar search in time seriesA novel method is proposed to fast search similar pattern in time series using fulltext index technique. The method first segments time series based on a series of perceptually important points, use segment dynamic time warping distance as measurement, and then time series are converted into meaningful symbol sequences in terms of the segment’s features and MATH categorization. After that, use above index model-IRST, to achieve fast similarity retrieval in multiple time series. The method is proved not any false dismiss in the theory and experiments show it has more efficient search and allows different lengths matching, compared with the previous methods.
【Key words】 Time series; Data Mining; Search Based Similarity; Inter-Related Successive Trees IRST;
- 【网络出版投稿人】 上海师范大学 【网络出版年期】2004年 03期
- 【分类号】TP311.13
- 【被引频次】7
- 【下载频次】540