节点文献
数据流连续查询的自适应降裁策略研究
Research of Auto-adapted Load Shedding Algorithm on Data Stream Inquires Continuously
【作者】 吴亚娟;
【导师】 马瑞民;
【作者基本信息】 大庆石油学院 , 计算机软件与理论, 2010, 硕士
【摘要】 有关数据流连续查询的自适应降载方法研究是近几年数据流研究领域的核心内容之一。虽然,国内外的一些学者在此方面已经做了大量研究和实验工作,也取得了一些研究成果,但也出现了一些新的问题。例如,目前的数据流查询处理技术主要考虑滑动窗口的数据,但这些研究工作都基于滑动窗口的大小固定不变,然而在很多实际应用中却存在内存资源浪费、查询结果达不到精确要求等问题。现有的基于窗口运算的降载方法大多是针对主存的限制展开研究的,需要研究CPU计算能力的限制等问题。本文在对以上问题进行了比较深入地分析、研究的基础上,首先提出了一种新的数据流连续查询方法,即一种基于变窗口的数据流连续查询方法。通过在缓存中加入窗口控制器实现变窗口技术。当内存资源不足或者数据流流速过大时,根据用户提出查询具体问题、具体数据情况,窗口分配算子对流入的数据流按照时间序列进行大小不等的划分,匹配算子根据查询计划与不同大小的滑动窗口进行相似模式匹配,使其与查询对象特征相对应。从而解决了内存资源浪费与查询结果的精确性问题。同时,针对CPU过载问题,本文提出了一种新颖的局部降载方法,即通过滑动窗口与运算对数据流上相邻的滑动窗口中输出速率较大的基本窗口进行合并处理,有效地解决了对数据流进行降载处理所面临的降载时机和降载方法等基本问题。最后,将所提出的降载方法进行了初步应用。由于数据预处理技术在数据流挖掘中具有十分重要的作用,对数据流进行预处理可以改进数据流的质量,也可以有效地预防数据流过载情况,有助于提高挖掘过程的精度和性能。本文针对数据流具有的特征,借鉴和引用已有静态数据集预处理的方法,提出了采用抽样、滑动窗口模型等方法构造概要数据结构时,将得到的数据流数据的样本进行再处理的思想,以进一步减小挖掘算法的时间和空间复杂度。
【Abstract】 The method of auto-adapted load shedding on data stream inquires continuously is becoming one of the core contents in the data stream research area in recent years. In this aspect,some scholars of the domestic and foreign have already done the massive research and experimental work,as well as obtained some research results,but also had some new problems.For example,at present,data stream inquiry processing technology main consideration sliding window data,but these research work all based on sliding window which size fixed invariable,however,it will lead to the memory resources wasted, and the inquiry result can’t match the accuracyin many practical applications.these methods of load shedding which based on the window operation were mainly researched on the limitation of main memory. Need to study the issues of the computing power in CPU.Firstly,in this essay ,based on the analysis and research of inquires continuously and load shedding to the data stream combined with related technology,a method of data stream inquires continuously which based on the variable size windows was proposed.That is putting a window controller in the cache,to achieve the variable window technology,When the memory usage is not full.According to the user’s questions and the speed of data stream flow, The assignment operators will change the data streams’size in term of time-lists. at one time,the Matching operator carry through similar pattern matching,to make it correspond with the query characteristics The method solves the issues of memory resources wasted and the accuracy of query result.Secondly,for the sake of solved the CPU overload situation.In this essay,an original method by combine some basic windows with more larger output rate which are included in the sliding windows on the adjacent over data stream can easily solved the problem.Finally,application has been conducted.Pre-processing technology play a very important role in data stream mining,it can improve the quality of data stream and effectively prevent data stream overloading,it also help to improve the precision and performance in the process of mining.In this essay, based on the characteristics of the data streams,use the experience of the preprocessing methods of static data sets.A method was proposed to reprocessing the data stream which have gotten by sampling method and sliding window model method.This method will reduce the spatial complexity and temporal complexity.
【Key words】 data streams; data stream pre-processing; continuous query; load shedding; operator; threshold;