节点文献
地表水源水质预测模型数据挖掘技术及其适用性研究
The Study on Data Mining and Applicability of Model for Predicting Surface Water Quality
【作者】 赵英;
【作者基本信息】 哈尔滨工业大学 , 市政工程, 2008, 博士
      
      【摘要】 近年来,随着我国工业化以及城镇化进程加快,环境污染日益严重,突发环境污染事件频繁爆发,对人体健康、生态安全以及生产生活构成重要影响。松花江为黑龙江省主要饮用水源地,但目前松花江干流饮用水源地水质污染严重,已对人民生产生活造成重大影响,为此急需采用现代化手段实现对松花江流域水质的科学管理。本文以松花江流域为主要研究对象开展水质预测模型研究,本研究具有一定的理论研究价值和实际应用价值。本文利用水质在线监测系统采集水质数据,水质数据将作为水质预测模型中训练集的数据来源。为了保证预测模型具有较高的预测精度,需要把原水水质数据进行处理,形成符合要求的训练集数据,本文将数据按月分期,应用聚类分析法对数据进行处理,剔除异常数据,使有效数据能够均匀分布,提高预测精度。此外,应用聚类分析法根据诸多影响因子与预测对象之间数据分布情况,筛选出最优影响因子,合适影响因子的选择对于提高预测模型的预测精度具有较高价值。并通过测试研究验证聚类分析法处理数据后对预测精度的影响效果。本文将水体污染分为两类:常规污染和突发污染。在常规污染预测研究中,引用人工神经网络技术,应用MATLAB软件建立常规水质预测模型。预测点位置选在哈尔滨四方台监测站,以松花江流域水污染的主要指标COD Mn为预测对象,经聚类分析法研究确定当日CODMn、水温等6项参数为影响因子,并采用1997-1999年的四方台监测站日监测水质数据作为样本集,以当日6项影响因子预测未来3日CODMn,为了保证模型具有较好的预测效果,预测模型需要不断更新和维护。为了提高模型预测精度,本文将聚类中心点作为初始权值对模型进行训练,并将应用前和应用后得到的预测模型应用到水质预测中,进行对比研究,考察其应用效果。在同一水域不同水质期预测效果研究中,将松花江分为丰雨期、封冻期和其它时期,分别考察预测效果,得到结论:流域在不同的水质期,预测效果有所差异,封冻期预测效果最好,丰雨期预测效果最差,其余时期介于两者之间。在水质预测模型地域性研究中,分别应用松花江预测模型和于桥水库预测模型对松花江流域和于桥水库两地进行交叉预测研究,对比预测效果得出:不同预测模型应用到不同水域进行预测研究时,对同一种预测对象,其预测机理具有某些共性,因此未来水质的变化也有相似性。这样在一些特殊情况下,用其它水域预测模型对本水域进行水质预测,对了解本水域未来水质的变化情况也可以起到一定的借鉴作用;并且只有某水域预测模型在该水域预测时,效果最好,而在其它水域进行预测时,效果较差,因此在实际应用时,建议尽量采用本水域的预测模型对本水域进行预测研究。在不同水域水质预测效果研究中,对比具有不同水质特点的松花江流域和于桥水库预测模型的预测效果,得出:对于不同特点的水域,如果该水域水质的变化受外因影响大,水流速和水更新快的水域常规水质预测模型预测效果差,反之,预测效果就会好一些。本文对常规水质预测模型和突发水质预测模型的嵌入式集成技术进行了研究,对比了当前GIS与应用分析模型集成的三种主要方式,结合本文特点选择了预测模型与GIS紧结合模式,同时利用ActiveX自动化技术解决了MATLAB与VB集成的多目标优化,从而实现利用MATLAB函数进行水质预测模型在GIS平台上的数据输入输出功能集成,实现对水质情况进行各时间段的科学预测。在常规水质预测模型应用研究中,介绍了常规预测模型功能,并应用其对2006年8月松花江流域四方台监测站CODMn进行预测研究,考察模型的预测效果。平均预测误差为4.79%,在所允许的范围内,因此在实际的水质管理中该预测模型具有一定的指导意义。系统嵌入式集成技术研究中,突发水质预测模型选用地表水模拟模型SMS模型,系统通过调用SMS,最终形成可视化的污染物迁移模型,对地表水水质、流速、流态等进行分析。地表水源水质预测模型数据挖掘技术及其适用性研究对于指导水厂生产,为地表水环境的科学管理和决策提供了科学依据。
【Abstract】 In recent years, as China’s industrialization and urbanization process accelerated and the environmental pollution increased seriously, emergent environmental pollutions which happen frequently have made an important impact on human’s health, ecological security, production and daily life. The drinking water sources in the main stream of Songhua River, which is the main source of drinking water in Heilongjiang Province, have been polluted seriously and it has a significant effect on people’s production and life and it need urgently to adopt modern means to achieve the scientific management of the Songhua River Basin.This paper has developed the water quality prediction model based on the Songhua River Basin and this study has a value of theoretical research and practical applications.The water quality data of this paper is collated by using the on-line monitoring system and it will serve as the data source of the training set in the water quality prediction model. In order to ensure the prediction with high prediction accuracy, it needs to preterit the water quality data of the raw water and form to the data of the training set that meets the requirement. This paper separates the data into terms by month, retreats the data by applying cluster analysis method and rejects the abnormal data, so that the effective data can be distributed evenly and improve the prediction accuracy. In addition, it can select the optimal influencing factor by applying the cluster analysis method according to the data distribution between the several influencing factors and the forecasting object and the choice of the factor has a high value to improve the prediction accuracy of the prediction model. This paper also studies the effect to the prediction accuracy after the data pretreatment by the cluster analysis method.This paper separates the water pollution into two types: conventional pollution and sudden pollution. It introduces the artificial neural network technology and applies the MATLAB software to set up the conventional water quality prediction model in the study of the conventional water quality prediction. In the model, the prediction position is in the Sifangtai Station in Harbin, and it takes COD Mn—the main index in the Songhua River Basin water pollution—as the prediction object, determines the CODMn,the water temperature and other four parameter of that day as the influencing factor by the cluster analysis method, and adopts the daily water quality detection data in Sifangtai monitoring station from 1997 to 1999 as the sample set to predict the Codman in the next three days based on the six influencing factors of that day. The prediction model needs to be updating and maintaining continuously for better forecasting effect. In order to improve the precision of the prediction, this paper sets the cluster center point as the initial weight to train the model, and applies the predict model both before applying and after applying to water quality prediction. Then we consider the effect of the application through the comparing research.In the study of the prediction effect in different water quality stages in the same river, it divides the Songhua River into plentiful period, freezing period and other period, and makes researches on the prediction effect in different periods. The results show that the prediction effect varies in different periods, best in the plentiful period, worst in the freezing period and between them in the rest.This paper studies the cross prediction research of the regional water quality prediction model by using the Songhua River prediction model and Yuqiao Reservoir model to predict the water quality in the two places. The prediction mechanism has some commonness for the same object when using different prediction models to forecast the water quality in different rivers and the changes of the water quality in the future has similarities. In some special circumstances, it has a reference role to understand the changes of the water quality in future by using other river prediction models to forecast the water quality in this river. The comparison shows that a prediction model of some river has a better result when predicting the same river and a worse result in other rivers, so using the prediction model of this river forecasts the water quality as possible in practice.Comparing the prediction effect in the Songhua River model and Yuqiao model,the conclusion is that for the rivers with different characteristics, if the water quality changes in the external factors, the conventional water quality prediction model of the river with fast water flow and water update forecasts poor results;conversely,the results will be better.This paper studies the embedded integration technology of the conventional water quality prediction model and unexpected water quality prediction model, compares with the three ways of the current GIS and application analysis model integration, selects the mode of combination of the prediction model and Misuses ActiveX automation technology to solve the multi-objective optimization of the MATLAB and VB integrated, thus realizes the integration of data input and output functions of the water quality model on the GIS platform by using MATLAB function, to forecast the water quality of all the circumstances.This paper introduces the functions of the conventional prediction model in the research of the application of the conventional water quality prediction model, which is applied to forecast the CODMn in Sifangtai monitoring station in Songhua River in August, 2006, studies the prediction effect of the model. The average prediction error is 4.79% within the allowable range, so this model has a guiding significance in the actual water quality management. On the research of the embedded integration of the system, SMS model is chosen as emergent water quality prediction model. And system calls SMS model to form the visual pollution transport model in the end and analyze the surface water quality, velocity and flow patterns and so on.The study on data mining and applicability for prediction model of surface water quality provides the scientific basis for the guiding of the water plant production, for the scientific management and decision making of the surface water environment.
【Key words】 water quality data; cluster analysis; water quality prediction; prediction model; embedded integration technology;