节点文献

水质时间序列数据挖掘及其应用集成研究

Study on Water Quality Time Series Data Mining and Application Integration

【作者】 刘祥明

【导师】 石为人;

【作者基本信息】 重庆大学 , 控制理论与控制工程, 2011, 博士

【摘要】 随着长江三峡库区蓄水位不断升高,给库区水环境质量带来了巨大的影响,为更进一步掌握库区水环境质量状况以确保库区水环境安全,环保部门逐步建立了愈来愈完善的三峡库区水质监测系统,获得了大量的水质时间序列数据。由于时间序列数据本身所具备的高维性、复杂性、动态性、高噪声特性以及容易达到大规模的特性,急需一种方法从这些数据中发现水质变化规律和分布状况。本文以时间序列数据挖掘为基础理论,以重庆市科技攻关重点项目(CSTC,2006AA7024)“三峡库区水环境安全预警平台与辅助决策关键技术研究”为应用背景,针对水环境中水质监测时间序列数据,围绕时间序列数据挖掘的理论与方法研究,从时间序列模式表示、多维时间序列相似度量、时间序列预测和时间序列数据挖掘模型与预警平台的应用集成四方面开展以下研究工作:①分析时间序列的模式表示,着重讨论时间序列的分段模式表示,通过分析分段线性化与分段多项式的基本思想,将分段线性化全局连续性的优点与分段多项式局部形态保持的优点结合起来,提出一种时间序列分段多项式连续模式表示方法。实验表明,时间序列分段多项式连续模式表示在保留分段多项式在局部形态上优点的同时具备分段线性化的全局连续性,且能够与分段线性化模式表示兼容,更好地保留时间序列形态。该算法具有的这种时间序列形态保持性,能够作为时间序列的趋势提取和噪声过滤的基本算法,可以推广应用于水环境安全领域中水质时间序列数据预处理过程。②在对一维时间序列Lp距离和DTW距离的相似性度量分析的基础上,研究将空间路径相似引入多维时间序列的相似性度量中,通过空间路径的相似性度量确定多维时间序列的相似性,提出基于路径DTW相似的多维时间序列相似性度量,并将其应用于多维时间序列聚类。通过实验将基于一维相似累加、基于路径欧氏距离相似和基于路径DTW相似的多维时间序列聚类效果进行比较,结果表明在多维时间序列聚类上,以路径DTW相似聚类能完全正确分类,路径欧氏距离相似聚类只在差距较小的多维时间序列间存在误分,效果都优于一维相似累加聚类。基于路径DTW相似的多维时间序列聚类用于水环境河流分类,获得了较好的实际应用效果。③针对时间序列神经网络预测的过拟合问题,研究RBF和神经网络集成的基本原理,结合PCA技术与样本聚类技术,提出一种时间序列RBF神经网络集成预测的方法。在时间序列分割后形成样本的基础之上,使用PCA技术得到新样本,以新样本维数为个体RBF网络输入维数,顺序选择新样本簇的中心和半径为个体RBF的中心参数,将个体RBF对输入/输出的先验知识引入到平均集成中。实验表明,该方法的时间序列预测精度高于任意个体RBF的预测精度,将该方法应用于水环境水质预测,获得了较好的实际应用效果。④研究时间序列数据挖掘与三峡库区水环境安全预警平台的应用集成方法。以服务请求的应用集成机制为基础,设计了基于服务请求的应用集成结构,及其开放的集成计算服务器结构,同时定义了应用客户端与集成计算服务基于XML的模型查询、模型目录、计算请求和计算结果的服务请求协议。本文所研究的时间序列数据挖掘能有效地进行水质趋势提取以及噪声过滤、根据水质各指标对河流进行分类和水质预测;设计的开放式集成计算服务器结构,不但能够支持水质模型服务的扩展,同时也能满足其它系统或平台对这些模型的需求,并通过实际应用验证了方法的可行性。

【Abstract】 As the water level of Three Gorges Reservoir rising, it has brought great influence to the reservoir water quality. To further grasp the reservoir water quality conditions to ensure the water environment safety of the reservoir, The environmental protection departments has gradually established more and more water quality monitoring systems for the Three Gorges Reservoir, and got a lot of water quality time series data. We needed a way to find variation and distribution of water quality from these data, due to the characteristics of time series data itself such as the high dimensional, complex, dynamic, high noise, and easy to achieve large-scale.Based on the theory of the Time Series Data Mining (TSDM) and supported by project of the ChongQing Science&Technology Committee: "Research on Water Environment Security Early Warning Platform and Key Technology of Science Decision in Three Gorge Region "(CSTC2006AA7024), this research, based on the time series data in water quality monitoring of environmental and focusing on the method and theory of the TSDM, was carried out from the following four perspectives: the time series representation, similarity measures of Multivariate Time Series (MTS), time series forecast and application integration.①This paper analyzed the representation of time series and focus on piecewise representation. The author present a piecewise polynomial continuous representation, through analysing the basic theory of piecewise linear and piecewise polynomial, overall the advantages of global continuity of the piecewise linear and local shape retain of piecewise polynomial. As the simulation results indicated that the continuous piecewise polynomial representation of time series retained the advantages of piecewise polynomials in the fitting, owned the advantages of continuity with piecewise linear, and was compatible with the piecewise linear, retained most representative of the time series characteristics. Form maintaining of time series in this algorithm enables it to be used to the trends extract and noise filter for the time series and applied to pre-process time series data in the field of water safety.②The author introduced the similarity of space path into MTS similarity measure based on the analyzing of the similarity measure between the distance of STS and the DTW distance, and propose a clustering method for the MTS based on the path similarity through measure the similarity of the space path to determine the similarity of MTS. The author compared the clustering result based on the path similarity of DTW with the clustering result based on the similarity of Euclidean distance and MTS clustering which is STS similar. The result of comparing shows that the clustering method based on DWT similarity can cluster data in the multidimensional time series completely correct, and the method based on the similarity of path Euclidean distance exist error just in the MTS which their difference is smaller, the two methods of clustering are all better than method based on theSTS. The better effect of practical applications was got when the method of clustering based on the DTW similar path or multivariate time series used in the river clustering of water environment.③In view of the forecast problem in neural networks for time series, the author research the basic principles of RBF and neural network integration, combine PCA technique and sample cluster center, proposed a RBF neural network integration method. The idea of the RBF neural network integration is using the dimension of the sample which is split from time series with the technology of PCA as the input dimension of Individual RBF network, orderly select the cluster center and radius of the sample as the individual RBF parameters, and introduce the priori knowledge into the average ensemble. Through the experiment, the result shows that the forecast accuracy of the RBF neural network integration method is higher than any individual in the forecast accuracy of RBF and the method get better application effect when using the method in the water quality forecast of water environment.④The author design an application structure based on the service request mechanisms and an open TSDM calculation server architecture through combining the service request mechanism and TSDM method, and define the service request protocols for the client and the compute service, the protocols contain module querying, module contents, compute request and compute result, which are all based on the XML.The water quality time series data mining researched in this paper can extract effectively trends and filter noise, classify the rivers and forecast water quality. The open computing servers open integration structures, water quality model not only can support the expansion of services, while can also meet other system or platform, the demand for these models, and verified through the practical application of the method is feasible.

  • 【网络出版投稿人】 重庆大学
  • 【网络出版年期】2012年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络