节点文献

空间数据挖掘中若干关键技术研究

Research on Several Key Technologies in Spatial Data Mining

【作者】 贾俊杰

【导师】 张勤;

【作者基本信息】 长安大学 , 地球探测与信息技术, 2009, 博士

【摘要】 空间数据挖掘(Spatial Data Mining,SDM)是指从空间数据库中抽取隐含知识、空间关系或非显式存储在数据库中有意义的特征或模式。该技术在理解空间数据、获取空间与非空间数据间内在关系方面具有重要意义。由于近年来空间地理信息系统(Geography Information System,GIS)广泛地应用到各个行业中,积累大量与空间位置相关的空间数据,因此空间数据挖掘研究己成为当前研究的重要课题。本论文正是在这种背景下,在系统地讨论空间数据挖掘的基本理论的基础上,对空间数据挖掘的若干关键技术进行重点研究,论文研究成果可归纳如下:1.在总结已有研究工作的基础上,研究了位置-属性一体化的实体信息模型,并分析了3种空间距离测度,可以作为空间计算的基础准则;通过对空间权重矩阵进行拓展,介绍了空间实体关联矩阵的概念,并分析建立方法,为空间数据挖掘提供了新的基础工具。2.描述了基于模型聚类的混合模型和基本的期望最大化算法(ExpectationMaximization,EM)算法,尽管EM算法具有普遍性,但是它在实际应用中还是常常受到计算效率的限制。EM算法每一步的迭代中需要遍历所有的样本点。如果数据集非常大,则计算强度也会增加。因此,提出了基于随机子样本的节省计算的递增EM(IncreasingEM,IEM)算法,该算法运行在子集而不是完全样本集上,每一次迭代中,只有较少的样本点需要被估计,这使得算法在运行时间上具有可观的改善。通过EM高效的似然判断条件和增量因子,可以对样本子集的容量自动选择。IEM算法提高了计算效率,并且不需要牺牲似然估计的精确度。3.由于EM算法不适合空间聚类对空间信息的要求,而邻域EM(Neighborhood EM,NEM)算法虽然结合了空间惩罚项,但是NEM在E-step步需要大量的迭代。为了既能满足空间信息的要求,又能避免过多的计算量,利用IEM的思想,提出了EM与NEM二者相结合的混合递增NEM(Mixed Increasing EM,MNEM)算法,算法首先在随机子样本中进行EM训练,直到似然判断条件下降,根据增量因子进行样本更新,然后样本转向NEM训练一次,如此进行循环递增的交叉训练,使得计算量降低,性能提高。4.在包含被相关属性集合所描述的谓词的空间数据库里,进行多概念级空间关联规则挖掘。一个多级关联规则模式是一个频繁谓词集合,在这个集合里,所有构成谓词的项目分别有一个确定的概念层次。本文提出了在空间数据库中挖掘多概念级空间关联规则的新算法,该数据库中存储了经过空间查询和空间计算所获得的空间谓词,并且依据关系表R的空间关系建立母元素表和频繁类匹配表,这使得多概念级空间关联规则挖掘更加方便和有效。5.方位信息是图像数据库中最重要的信息类型之一,而9DLT(Nine DirectionLower-Triangular Martix)表达形式是方位表达的基本方法,据此提出了一种在图像数据库中根据空间方位关系挖掘空间关联规则的(9DLT Image Mining,9DIM)算法,在这里每幅图像都被初始化为9DLT字符串,形成类似于事务数据库的图像模式数据库,每个9DLT字符串(图像)代表一个事务。以图像对象之间的关系模式,建立频繁k-1(k>2)模式库,并由频繁k-1模式库构造频繁k模式树的方式,依次可以发掘到所有对象间的频繁模式。比Apriori算法更加有效。

【Abstract】 Spatial data mining of spatial databases is the extraction of implicit knowledge, spatial relations and discovery of interesting characteristics and patterns which are not explicitly represented in spatial databases.The technique can play an important role in understanding spatial data and capturing the intrinsic relationships between spatial and non-spatial data. In recent years Geography Information System(GIS) has been used in many fields.It has become one of the important tasks, which need be studied currently ,because the amount of spatial data obtained from GIS and other sources has been growing tremendously.It is under such background that the author effectively studies the corresponding several key technology on the spatial data mining, and systematieally discussed the basic theory of spatial data mining in this thesis. The achievements of this dissertation can be concluded as follows:1 .Based on the present research, the author studis the measurement of spatial distance as a basic rule of spatial computation. By extending the method of spatial weighted matrix, the author analyse forth the conception of spatial entity association matrix, and analyse the method of their establishment and offers new basic tools for SDM.2.The author describe the mixture model for model-based clustering and the classic form of the Expectation Maximization(EM) algorithm. Despite EM’s wide-spread popularity, practical usefulness of EM is often limited by computational inefficiency. EM makes a pass through all of the available data in every iteration. Thus, if the size of the data set is large, every iteration can be computationally intensive. the author introduces the Increasing EM(IEM) algorithm for fast computation based on random sub-sampling.Using only a subset rather than the entire database allows for significant computational improvements since many fewer data points need to be evaluated in every iteration. The author also argue that one can choose the subsets intelligently by appealing to EMs highly-appreciated likelihood-judgement condition and increment factor. IEM algorithm can lead to significant computational improvements without sacrificing accuracy of the results.3.EM algorithm is inappropriate spatial clustering to requires conside-ration of spatial information. Although neighborhood EM (NEM) algorithm incorporates a spatial penalty term, it needs more iterations in every E-step. To incorporate spatial information while avoid too much additional computation, the author proposed Mixed Increasing NEM(MNEM) algorithm that combines EM and NEM. In MNEM, the author first train data based on random sub-sampling in EM till the likelihood-judgement condition begins to decrease,and update sub-sampling .Then training is turned to NEM and runs one iteration of algorithm. Because of this cross train of cycle, MNEM algorithm’ computational complexity is decreased and capability is advanced.4.The multilevel spatial association rules are discovered from a spatial database in which all items of predicates are described by a set of relevant attributes. A multilevel association pattern is a frequent predicate-set in which all items constituting predicates is at a certain concept level, respectively. In this paper, we present a new approach to discover strong multilevel spatial association rules in spatial databases by storing separately the spatial predicates acquired by the execution of spatial query and some efficient spatial algorithms. Then we construct parent element table and frequent class-matched table based on the spatial relations denoted as relation table R.This makes the discovery of multilevel spatial association rules easy and efficient.5.Directional information is one of the most important types of information in an image database, and the Nine Direction Lower-Triangular Martix(9DLT) representation is fundamental in this method. Therefore, we propose a novel spatial mining algorithm, called 9DLT Image Mining(9DIM), to mine the spatial association rules from an image database, where every image is represented by the 9DLT representation.Image mode database is similar to Transaction Database because every 9DLT character string express a transaction.According to relation mode among the image object, we construct frequent k-1 (k>2) mode database. By way of construction of frequent k mode tree based frequent k-1 (k>2) mode database, we can mine frequent mode all of object. Since our proposed algorithm prunes most of impossible candidates, it is more efficient than the Apriori algorithm.

  • 【网络出版投稿人】 长安大学
  • 【网络出版年期】2009年 11期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络