节点文献

基于决策树技术的遥感影像分类研究

Application of Decision Tree Technology for Image Classification Using Remote Sensing Data

【作者】 陈鑫

【导师】 彭世揆;

【作者基本信息】 南京林业大学 , 森林经理, 2006, 硕士

【摘要】 决策树分类法已被应用于许多分类问题,但应用于遥感分类的研究成果并不多见。决策树分类法具有灵活、直观、清晰、强健、运算效率高等特点,在遥感分类问题上表现出巨大优势。本文以广东省广州市从化地区的SPOT5卫星遥感影像为研究对象,基于决策树分类算法在遥感影像分类方面的深厚潜力,探讨了6种不同的决策树算法——包括单一决策树模型(CART,CHAID,exhaustive CHAID,QUEST)和组合决策树模型(提升树,决策树森林)。首先对决策树算法结构、算法理论进行了阐述,然后利用这些决策树算法进行遥感土地覆盖分类实验,并把获得的结果与传统的最大似然分类和人工神经元网络分类进行比较。结果表明: 1.在卫星影像的整体分类精度的上,决策树分类技术要优于神经元网络和最大似然分类。相对于神经元网络方法,决策树在训练样本数据的速度要快,并且执行效率要高,对于输入数据空间特征和分类标识,具有更好的弹性和鲁棒性。并且,决策树并不象神经网络方法那样具有黑箱(black box)结构,和分析人员有着良好的交互性和透明性,而神经元网络初始参数的设定是根据实践经验来确定的,缺乏有效的理论指导,带有很强的主观性,并且在训练过程中这些参数要经过不断的调整,才能生成一个较好的网络模型,而且,神经网络的训练非常费时并且对训练结果的好坏事先缺乏判断,这些都是不如决策树分类的地方(在本研究中,虽然神经元网络的分类精度达到了82%,要优于CHAID,但是这种分类精度是在不断调整神经网络的初始参数通过数次尝试才达到的)。相对于最大似然分类,决策树的树状分类结构对数据特征空间分布不需要预先假设某种参数化密度分布,所以其总体分类精度优于传统的参数化统计分类方法。 2.在所有的决策树分类技术中,,组合决策树模型(决策树森林和TreeBoost)的总体表现要比单一决策树模型(CART,CHAID,Exhaustive CHMD,QUEST)模型优秀,但是组合决策树模型的“黑箱”结构其可视性和可解释性又不如单一决策树,正以为如此,在分类过程中要选择那种模型要视情况而定,单一树模型能直观的理解预测变量在分类中的作用和能生成清晰的类别判别规则,而组合树模型通常能比单一树模型得到更高的预测精度。 3.在组合决策树模型中,TreeBoost是通过将每次预测函数的输出赋以一定权重并重复的应用该预测函数来使得总预测误差达到最小化而提高分类精度的。而与TreeBoost模型不同的是,决策树森林中的每棵树都是独立平衡生长的并且它们在所有的树生成之前是不相互影响的,整个森林的预测精度是由其中每一棵树的预测精度组合而得到的。相对于单一树模型,组合树模型能显著提高分类精度,并且该模型能避免过拟合现象,因而不需要对其进行修剪。一般来说,组合树模型树的数目越多,该模型的预测效果越好。 4.在单一决策树模型中,各分类方法的差别主要体现在决策树生长过程中预测变量的选择和变量分割点的选择。在本研究中,CART(精度86.5%)要优于QUEST(82.7%)>CHAID(81.5%)>Exhausitve CHAID(81.3%)。

【Abstract】 Choice of a classification algorithm is generally based upon a number of factors, among which are availability of software, ease of use, and performance, measured here by overall classification accuracy. The maximum likelihood (ML) procedure is, for many users, the algorithm of choice because of its ready availability and the fact that it does not require an extended training process. Artificial neural networks (ANNs) are now widely used by researchers, but their operational applications are hindered by the need for the user to specify the configuration of the network architecture and to provide values for a number of parameters, both of which affect performance. The ANN also requires an extended training phase.In the past few years, the use of decision tree (DTs) to classify remotely sensed data has increased. Proponents of the method claim that it has a number of advantages over the ML and ANN algorithms. The DT is computationally fast, make no statistical assumptions, and can handle data that are represented on different measurement scales. Pruning of DTs can make them smaller and more easily interpretable, while the use of treeboost and tree forest techniques can improve performance.In this paper, we present several types of decision tree classification algorithms and evaluate them on SPOT5 remote sensing data sets of Conghua area.The decision tree classification algorithms tested include a single decision tree model (CART, CHAID, exaustive CHAID & QUEST) and ensemble decision tree model(TreeBoost&Decsion Tree Forest). Classification accuracies produced by each of these decision tree algorithms are compared with both artificial neural networks and maximum likelihood classifiers. The results showed as follows:(1)Decision trees in general, and the decision tree forest in particular, produced consistently higher classification accuracies than MLX algorithms. Several factors contribute to this result, the most important being that decision trees can adapt to the noisy and nonlinear relations often observed between land cover classes and remotely sensed data. Decision trees have the further advantage of being nonparametric and therefore make no assumptions regarding the distribution of input data.(2)Most of tested decision tree algorithms perform better than ANN while some (CHAID & exhaustive CHAID) did not. But neural networks have a number of drawbacks. First, neural networks do not present an easily-understandable model. When looking at decision tree, it is easy to see that some initial variable divides the data into several categories and then other variables split the resulting child groups. This information is very useful to the researcher who

  • 【分类号】S771.8
  • 【被引频次】28
  • 【下载频次】1407
节点文献中: 

本文链接的文献网络图示:

本文的引文网络