节点文献

数据挖掘技术支持下的土壤重金属污染评价系统的研究

A Study of Soil Heavy Metals Pollution Assessment System Supported by Data Mining Technology

【作者】 成伟

【导师】 王珂;

【作者基本信息】 浙江大学 , 农业资源利用, 2009, 博士

【摘要】 (1)本文的研究意义。土壤不仅是农业生产的基础,而且还是人类生存坏境的重要组成部分,在生态系统中有着独特和重要的地位。目前中国土壤污染的总体形势相当严峻。引起土壤重金属空间变异的原因比较复杂,不仅包括母质、气候、地形等非人为因素,还有施肥、农药、土地利用方式和工业污染等人为因素。不同尺度上土壤重金属的变异情况和变异机理也会有所不同。由于人类活动(工业、农业生产)或者自然变化(土壤母质矿化)而引起的土壤重金属时空变化,这些变化均能导致土壤重金属时空属性数据的复杂化,而且土壤中不同重金属之间的相互关系也在空间上表现出复杂的相关性与变异性。因此,探索科学的土壤重金属污染评价方法、开发相应的土壤重金属污染评价软件对于保护土壤环境质量、消减土壤污染,保障我国土壤生态安全和粮食生产安全具有重要的现实意义。(2)本文的主要研究内容和研究方法。地统计学方法虽然在研究土壤重金属污染空间变异方面取得了很大的成功,但因为地统计学的应用首先要符合内蕴假设,而不同土壤特性的空间变异性是否符合内蕴假设尚不清楚;其次,半方差函数的拟合曲线选择受主观因素影响较大;再次,取样数目和采样布局直接影响变异程度的高低,如何经济、高效、可靠地确定采样策略和方法仍然是一个难题。同时由于Kriging方法模拟是利用周围已知样点的数据去模拟未知区域中的数据。在土壤重金属污染的空间分布模拟和评价中,很多变化剧烈的异常区的信息在Kriging插值平滑后丢失。富阳市土壤重金属含量受人为活动特别是工业活动影响强烈,空间自相关性相当低。统计分析表明富阳市土壤重金属含量在分布上变异强烈,不符合正态分布。另外,借助于GIS的空间分析技术往往也只能得到定性的分析结果,而很难综合考虑人类活动之间的交互作用。而空间数据挖掘技术恰恰能有效的弥补这些不足。由此本文结合区域土壤重金属污染评价系统的建立,引入决策树方法、模糊综合评判等数据挖掘方法,更好地适应重金属污染程度变异大而且相对区域较集中的特点,初步建立起了基于组件式GIS的土壤重金属污染评价系统,从而提高评价的针对性、精度及效率。(3)主要的研究结论。在应用决策树方法(CART)对整个富阳市进行土壤Zn污染评价后,结果显示其评价精度由原先的提高到了89.39%和87.18%(训练集和测试集)。Kappa系数则由原先的0.2584提高到了0.8296和0.8018(训练集和测试集)。结果还是比较令人满意。当然,决策树方法并不能取代地统计插值法,二者在这里是一种互补的关系。同时,相对一个行政区域而言,它有工业活动剧烈的乡镇,但同时也存在着相当数量的乡镇,因为自然条件、交通运输及自身发展的其它主客观因素影响,从而使得其工业活动处于一个较低的水平,这类地区重金属的空间分布主要受母质及土壤类型等自然因素影响,那么它还是符合Kriging方法的基本条件。在这种情况下,类似决策树的评价方法就有其欠缺与不足。所以在此提出了应用模糊数学的方法,先将各个乡镇的重金属空间分布的空间变异强烈程度进行比较,然后再针对该乡镇或者该地区所属乡镇的特点选择相应的模拟方法。本文利用了数据挖掘中的模糊综合评判方法,即利用各个乡镇在工业用地、建设用地、工业类型、境内交通及其距离交通主干线等因子方面的不同,分别计算出了隶属度并将污染程度按隶属度总共分5级。1为最轻污染,5则为污染最严重的乡镇。然后对这些地区或乡镇分别采取地统计或者决策树方法。从评价精度的结果来看,大部分地区(包括工业活动相当强烈的地区)的评价精度能提高到95%以上,特别是那些受到轻微污染的乡镇如湖源乡和常绿镇(它们的隶属度都为1),其评价精度能达到98%以上。最后,GIS与数学建模的紧密耦合提供了强大的技术支持。在环境、社会和经济问题的解决中,空间表达很重要。而现有的GIS软件缺乏复杂问题的预测能力和其它相关的分析能力,大多数数学模型描述的复杂的非空间过程非一般GIS系统所能完成的;应用模型软件则缺少足够灵活的类似GIS的空间分析环境,因而难以被缺少专业知识的用户接受;GIS与应用模型结合能使二者相辅相成,相得益彰。二者的紧密耦合是解决土壤重金属污染空间变异模拟评价的有效途径。

【Abstract】 Alongside air,water and the biota,the soil is of central significance in ecosystem research as it is the place where many kinds of interactions take place between minerals,air,water and the living environment.Heavy metals in soil originate either from weathering of parent material and/or from numerous external contaminating sources.In unpolluted regions,parent materials are the primary source of trace elements.Human activities,such as non-point source agriculture activities,release of emissions from nearby point sources such as smelters,and traffic may also affect the chemical composition of soil.In urban areas,deposition of pollutants emitted to the atmosphere from point sources such as residential heating,industrial facilities and mobile sources such as traffic are the primary sources of soil pollution.Various aspects must be considered by the society to provide a sustainable environment, including a soil clean of heavy metal pollution.The first among them is to identify environments(or areas) in which anthropogenic loading of heavy metals puts ecosystems and their inhabitants at health risk.Based on the soil sampling point data, DEM,soil map,and land use map,the present study explored the interwined influences of these factors on the heavy metal concentrations in soils,and evaluated the quality of amble land considering the complex influencing factors.(1) To understand the spatial distribution of soil heavy metals contents,and identify pollution sources,multivariate analysis,geostatistical methods and spatial analysis have been developed and widely applied in soil systems.But these methods could not efficiently simulate the spatial distribution of heavy metals which are greatly influenced by human activities.However,a prior requirement of these methods is to quantify the spatial autocorrelation between properties at different locations so that the information from samples can be weighted into an estimator of the values at unsampled locations.As a new modeling approach,the data mining technology(fuzzy comprehensive judgment and decision tree models) has been shown to have high predictive accuracy and Geographic information systems(GIS) have increasingly become a valuable management tool,providing an effective infrastructure for managing,analyzing,and visualizing disparate datasets related to soils,topography,land-use and land cover.So integration of the data mining technology approach with a GIS offers a potential solution in meeting this challenge.(2) The Fuyang County is assumed as representative to counties in the Yangtze Delta where the economic development has witnessed an unprecedented rapid growth since economic reform in 1978 and also heavily contaminated by industrial waste, mining,vehicular emissions and so on.Statistic analysis showed that Cu,Zn,Pb and Cd had been added by exterior factors,and Ni was mainly controlled by natural factors.The combination of multivariate statistical and geostatistical analysis successfully grouped three groups(Cu,Zn and Pb;Cd;and Ni) of heavy metals from different sources.Through pollution evaluation,it was found that 15.76%of the study area for Cu,Zn and Pb,and 46.14%for Cd suffered from moderate or severe pollution.Further spatial analysis identified the limestone mining activities,paper mills,cement factory and metallurgic activities were the main sources for the concentration of Cu,Zn,Pb and Cd in soils,and soil Ni was mainly determined by the parent materials.The simulated semivariogram for the raw Zn data of Fuyang presented a horizontal line,denoting soil Zn content was greatly influenced by exterior factors. Thus,the Box-Cox transformation was used to obtain normally distributed transformed soil Zn content.The experimental semivariogram suggested that the Box-Cox transformed Zn contents are best fitted to a Gaussian model dominated by a long-range structure.The nugget,sill and range of this semivariogram is 0.21,0.43, 9508 m,respectively,and the determination coefficient(r~2) is 0.72.So in this study, the classification and regression tree(CART) has been integrated with GIS to predict the spatial distribution of soil heavy metals contents in Fuyang County.The overall CART accuracy of assigning samples to the right Zn classes is 89.39%and 87.18%,the Kappa coefficient is 0.8296 and 0.8018 respectively for training data and test data.This is a great improvement compared to ordinary Kriging method in ArcGIS.The total accuracy of assigning kriged estimates of Zn classes to measured values is 41.79%,and the corresponding Kappa coefficient is 0.2584.The main reason for increased accuracy might be that Zn content in this study area is greatly influenced by human activities leading to localized sharp variations and hotspots which are smoothed over by Kriging with a long range variogram.Certainly,the method of CART decreases the measurement scale of the raw data to a lower level for the classification of the target.But decision makers and spatial planners require information on soil quality for different purposes:to locate areas suitable for organic(ecologically clean) farming and agro-tourism;to select sites suitable for conversion of agricultural to non-agricultural land,particularly for urbanization;setting up protection zones for groundwater pumped for drinking water; to estimate costs of remediation of contaminated areas and similar.So these classifications are however useful when detailed concentrations are not required.Of course,CART can not replace Kriging to predict heavy metals concentrations at unsampled points.The two methods have their own respective advantages and disadvantages in simulation the spatial distribution of soil heavy metals concentrations.(3) In this study,the fuzzy comprehensive judgment method also has been integrated with GIS to predict the spatial distribution of soil heavy metals contents in Fuyang County.First,different township’s membership was achieved in the Fuyang region by fuzzy comprehensive judgment.There were five membership degrees: unpolluted,light-degree,moderate,heavy-degree and severely polluted respectively. With different membership we have had different method to simulate the spatial distribution of soil heavy metals contents.Decision tree classification and kriging interpolation were both used.For example,the Huyuan and Changlv town’s membership is 1,so they were polluted most lightly.Then the ordinary Kriging method was taken to simulate the spatial distribution of soil heavy metals contents for the spatial autocorrelation of the two towns were not severely destroyed by human activities.If the spatial autocorrelation of the towns was severely destroyed,i.e,the towns’ membership is high(4 or 5),and then the decision tree model would be considered.The Divide-And-Conquer method has brought us a satisfied result of soil heavy metals contents prediction.Most towns of the Fuyang County could achieve a 95%or even higher assessment accuracy of assessment. (4) There are two primary advantages for the integration:①Spatial representation is critical to environmental problem solving,but GIS currently lack the predictive and related analytic capabilities necessary to examine complex problems;②Modelling tools typically lack sufficiently flexible GIS-like spatial analytic components and are often inaccessible to potential users less expert than their makers. The developed system seamlessly links ArcObjects and the data mining models, automating the transfer of parameters and data,and graphically displaying the analysis results.The system also removes the margin for error intrinsic to any manual process.Successful implementation of the data mining models involves the integration of GIS,multiple databases,and visualization tools for extraction of the needed model input parameters and for analysis and visualization of the simulated results.In this study,the developed system provided an approach for assessing the spatial distribution of soil heavy metals contents with high predictive accuracy,and to present model predictions over space for further application and investigation.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2010年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络