节点文献

基于服务的森林资源调查数据挖掘系统的研究

Research the Data Mining System in Services Architecture for Forest Resource Survey

【作者】 李广水

【导师】 宋丁全;

【作者基本信息】 南京林业大学 , 森林经理学, 2010, 博士

【摘要】 森林资源调查是林业工作的重要方面,而随着数字化林业的发展,林业基础调查数据的积累呈现快速增长的趋势,如何从海量数据中提取有价值的信息正是林业资源调查数据挖掘所应对的主要问题。在Internet的全球发展战略的影响下,基于WEB服务的应用集成已成为当前及未来信息系统的一个主要趋势,本研究探讨了WEB服务下森林资源调查数据挖掘系统的设计及实现。首先对近年来常用的数据挖掘算法在森林资源调查中的应用进行了归纳总结,概括了不同的挖掘算法在相关领域的应用特点及场景要求,随后对相关技术进行了介绍,在分析了基于WEB服务的数据挖掘模式及其特点的基础上,依据.Net平台分别在不同的模式下开发了数据挖掘的WEB服务:实现属性相关分析的本地数据挖掘和改进的Aprior算法对远程数据的频繁项集的查找,着重研究了针对大数据集的访问、网络资源占用、代码的可伸缩性等方面的系统设计。在此基础上,针对九曲水林场实验区构建了基于小班林分因子调查数据的决策树判定系统,并提出了特征数据集的品质这一概念,在属性约简过程中依此进行了约简阈值的设定,并基于.Net和WSBPEL进行了系统设计;针对决策树构建过程中易于出现数据碎片、子树重复等问题,进一步提出了基于分形维构建特征数据集的方法,分析了依据分形维数和信息增益对冗余属性的删除以及特征集的信息损失对决策树构建的影响;本部分的最后,依据实验数据比较分析了两种决策树归纳的特点。作为服务于大面积区域调查的一个重要数据来源,遥感数据在森林资源调查中的应用占据着越来越重要的位置,为此,基于WCS标准,分别研究了面向服务的遥感数据挖掘模式及其基于工作流的分布式系统架构,并采用.Net体系及WSBPEL流程建模语言,具体设计了一个遥感影像纹理关联规则的挖掘系统。基于此,提出了基于频繁项集的遥感图像特征抽取,该方法首先依据项集的频繁度及空间分布筛选候选频繁项集,再定义每一个频繁项集的空间表达能力值构建特征集。仿真在遥感图像上进行测试检验,针对EM算法对初始设置比较敏感的特点,采用了对同一特征集指定不同聚类数目并比较对数似然值确定最终聚类结果的方法。试验结果表明,本文提出的频繁集对研究区的林区具有较好的判别之后,在分析了支持向量算法的特点及协同训练理论的基础上,依据遥感影像的纹理特征,提出了基于纹理特征值及像素灰度值构建的两个训练集上协同训练支持向量机的算法CTSVMTRS,并具体设计了分布式CTSVMTRS系统。论文主要分析了如何针对一般森林资源调查的事务数据和遥感数据构建基于WEB服务上的数据挖掘系统,从对一些经典算法的系统设计过程中探讨了具体实现过程中的关键技术及算法改进策略,并对相关的设计进行了实验分析,仿真主要从两个方面进行:WEB系统的可行性及改进算法的有效性,实验结果证明了相关的观念,也表明了本研究在林业信息化应用集成方面具有一定的参考价值。

【Abstract】 Forest resource survey plays an important role in forest aspect. As digital forestry widely applicated, the accumulation of fundation data on forest survey increases quickly. Therefore, how to distill valuable information from the mass data become the main issue of forest data mining. Besides, under the impact of global developing strategic of Internet, the application integration based WEB Service has been a major trend in information system. This paper studies the design and development in forest investigate data mining system based on WEB Service.Firstly,the paper summarized the recent actuality on data mining application in forest resource survey, and drew out characteristics and scene requirement for different algorithms employed in the field of forest resource survey. After analyzing the data mining models and characteristics based on WEB Service, data mining WEB Service were developed with.Net Framework in different models, which includes an attributes relation analyzing running in the local server and an improved Aprior algorithm for searching the frequent item sets from route datasets. Besides, this development primarily studied on system designment on aspects of accessing to a large data source, occupying the network resources, process scalability and etc.. In the third part, to examine the each WEB Services architecture with study area data is the main work. A decision-tree system was constructed based on subcompartment inventory data which directed toward JiuQu Forestry that is located in FuJian province. After defining the quality of characteristic data set, Threshold value was decided in the process of attribute reduction, with this reserach, a distributed system was designed by using the.Net framework and WSBPEL. For issuses of how to reduce the data fragmentation and sub-tree repeat on training the decision-tree, the paper presented a method of constructing characteristic data set based on fractal dimension, and analysed the influence of using fractal dimension and information gain to droping the redundancy attributes, as well as information loss of characteristic data set to constructing the decision-tree. In this part, the paper summarized the characteristics of this two kinds of decision-tree by the experimental results.As a vital data source for large area researching, remote sensing data plays a more and more important role in forest resource survey. In the next part of the paper, the model of data mining in the WEB Services architecture and integrating by work flow was investigated. On this foundation, a remote sensing image data mining basing on the value of feature texture system was designed by the.Net framework and WSBPELIn the next chapter, extracting the feature from image basing on frequent itemsets is proposed, this method first selects out the candidate frequent itemsets founding on the degree of frequency and the locality distributing, then constructs the feature set by defining the value of spatial control for every frequent itemset. The simulation is done with remote sensor image, because of the EM clustering algorithm maybe be dominated by its initializing, the experimentation is designed with specifying the different number of clusters for every feature set and determines by the log likelihood. Experiment results show that frequent feature set can provide satisfactory expression for distinguzing the forestr proportion in image.Since Tri-training applied in semi-supervised learning can improve the classification precision, and how to construct two redundance data sets is the key for Tri-training, the system running in WEB Services named CTSVMTRS was worked out in the research, with analysing on SVM algorithm and texture property of remote sensing image, which is for Tri-training SVMS in remote sensing image based on two data sets that one is from pixel value and another from calculating texture property is presented.This paper mainly analyses how to design a data mining system based on WEB Services directed toward common forest resource survey affair data and remote sensing data. After designing some classical data mining algorithms in this system, the key technic and algorithm ameliorating strategy were researched in the concrete implementing process. By analysing the related designments, two tests were carried out for ensuring the design’s effective, one is for WEB service architecture and another is for improving of algorithms and appling to the forest resources survey, the result of experiment not only proves that these methods are effective, but also shows this research has reference value in the aspect of forest information system integration to a certain extent.

【关键词】 森林资源调查数据挖掘WEB服务
【Key words】 Forest Resources SurveyData MiningWEB Services
节点文献中: