节点文献

水文时间序列数据挖掘算法研究与应用

Research and Application of Hydrological Time Series Data Mining Algorithms

【作者】 吴佳文

【导师】 王丽学;

【作者基本信息】 沈阳农业大学 , 农业水土工程, 2011, 博士

【摘要】 水文时间序列数据挖掘是数据挖掘技术在水文领域的应用,它根据水文领域的数据特点和信息需求,选用高效的数据挖掘算法,从大量的水文数据中提取有用的信息和知识,为解决水文领域的突出问题提供新的分析方法和科学的决策支持。目前水文时间序列数据挖掘算法研究与应用还处于起步阶段,本文在对数据挖掘技术和水文时间序列数据特点深入分析的基础上,重点研究了应用数据挖掘技术进行水文时间序列模式描述、相似性度量、分类与预测等算法,并通过实测水文资料进行验证和评测。论文主要研究内容与成果如下:1.以水文时间序列局部极值点和形态特征为切入点,提出了一种基于要素特征的水文时间序列模式描述方法。解决了由于短期波动频繁、局部极值点多、数据点对应时刻不均匀等原因造成的分段描述算法不适用的难题。实验表明,这一方法简单、高效,适用性强。2.提出了一种改进动态时间扭曲距离公式——自适应分段动态时间扭曲(ASDTW)距离公式,从而构成了基于要素特征的水文时间序列相似性度量的完整算法,实验表明,该方法无论是对模式趋势的总体把握情况,还是与原始时间序列的拟合误差情况,均具有独特的优势,比较适合水文时间序列的数据特点。3.探索了利用数据挖掘算法进行水文时间序列分类问题。根据水文时间序列数据的特点,对模型树和基于实例学习算法进行改进。将支持向量回归与模型树等算法相融合,应用于等时隔的水文过程数据挖掘;从样本集提取、干扰样本处理和属性加权三个方向,对传统的基于实例学习算法进行改进,并与自适应分段动态时间扭曲(ASDTW)距离公式相结合,建立了不等时隔的水文要素摘录系列数据挖掘模型。4.选择一水文站流域作为实验区域,将基于支持向量回归的模型树算法进行了具体应用,建立了数据挖掘预测模型,并与新安江模型进行了对比分析,它具有输入数据少、过程简单、维护工作量小等优点,且精度能够得到保证。5.针对水文数据库水文要素摘录系列,利用改进实例学习算法,从数据准备开始、经数据预处理、初始实例集、样本集选取、属性权重赋值、相似属性度量,最终建立起水文要素摘录系列数据挖掘模型,通过与传统的降雨径流经验相关法进行比较分析,水文要素摘录系列数据挖掘模型具有操作简单、计算快、维护简单、成果可靠的优点,具有实用价值。本文主要创新点:在动态扭曲(DTW)距离基础上,适应水文时间序列数据特点,提出了自适应分段动态时间扭曲(ASDTW)距离公式,并与改进的基于实例学习算法相结合,解决了水文要素摘录系列数据挖掘问题;首次将支持向量回归与模型树等算法相融合应用于逐日径流预报,丰富了水文时间序列数据挖掘技术手段。

【Abstract】 Hydrological Time Series Data Mining, namely applying Data Mining techniques to hydrological time series, can offer new analysis methods and scientific decision supports to solve outstanding problems in the field of hydrology, by extracting useful information and knowledge from large hydrologic data with high-efficiency algorithms of Data Mining according to the feather of data and information demands in the field of hydrology. However the research and application of its algorithms is still in its beginning stage. In this paper, Based on analysis of Data Mining techniques and the feather of hydrological time series the Research is made on Data Mining algorithms applied to hydrological time series of Pattern Representation, Similarity Measure, Classification and Prediction, which are validated and evaluated by hydrologic data of actual measurement. The main contents and achievements are as follows:1. Taking local extreme points and feathers of Hydrological Time Series as the breakthrough point, a Pattern Representation method based on the factor feathers of hydrological time series is put forward, which can solve unfitness of the piecewise linear Pattern Representation algorithm for a variety of reasons such as short-term frequent fluctuations, much more local extreme points, unequal time intervals corresponding to data points and so on. The experiment shows that this method is simple, high-efficiency and adaptive.2. A improved Dynamic Time Warping formula, Adaptive Segmented Dynamic Time Warping(ASDTW), is put forward and then the algorithms of factor-feather-based hydrological time series Similarity Measure are formed completely. The experiment shows that the method has its unique advantages in regardless of wholly controlling of Pattern trends or fitting error of primary time series, and therefore it more fits the characteristic features of hydrological time series.3. It is discussed how to classify hydrological time series with Data Mining algorithms. And algorithms of Model Trees and Support Vector Regression are improved according to feathers of hydrological time series. Support Vector Regression and Model Trees are fused and are applied to equal-interval hydrological processes Data Mining. Classical algorithms of Instance-based Learning is improved in three directions-samples extracting, disturbances controlling and attributes weighting-and combined with Adaptive Segmented Dynamic Time Warping(ASDTW) in order that a Data Mining Model of unequal-interval hydrologic factors Series extracted is established.4. The algorithm of Model Trees based on Support Vector Regression is applied to practice in a watershed with hydrology stations as study area. A Data Mining predicting model is made and compared with Xin’anjiang model. By contrast, it can be shown that the former has its merits such as less data input, simpler process, smaller amount of maintenance, etc, and ensures its accuracy at the same time.5. A Data Mining Model of hydrologic factors series extracted, in view of hydrologic factors series extracted in hydrological databases, is made passing through an continuous procedure from data preparing, data preprocessing, instances initializing, samples choosing, attributes weighting, and similarity measure. Comparing with the traditional method of flood forecasting Rainfall-runoff Experience Correlation, the former is simple, rapid, easy-maintenance and reliable and so it is of practical value.The main innovations in this paper are as follows:Adaptive Segmented Dynamic Time Warping (ASDTW) formula is put forward based on DTW according to feathers of Hydrological data. An improved algorithm of Instance-based Learning combined with ASDTW solves the problem of Data Mining for hydrologic factors series extracted. For the first time, Model Trees are fused with Support Vector Regression and applied to daily runoff predicting successfully, which enrich techniques of hydrological time series Data Mining.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络