节点文献

时空孤立点检测算法研究

Research on Spatio-temporal Outlier Detection Algorithm

【作者】 席景科

【导师】 谭海樵;

【作者基本信息】 中国矿业大学 , 地球信息科学, 2010, 博士

【摘要】 相对于数据采集技术的飞速发展,数据挖掘技术的进展相对缓慢,这种情况在时空孤立点检测方面显得尤为突出,也就出现了“空间数据爆炸而知识贫乏”的现象,急需开发空间数据挖掘技术以发现隐藏在海量空间数据背后的知识。时空孤立点检测作为空间数据挖掘中的一个重要研究分支,是为了找到与时空邻居对象有显著差别的时空对象,它们的数量很少或几乎没有,非常容易被当作数据噪声而被忽略。然而识别时空孤立点能够发现一些意想不到的、有意义的时空模式。论文以空间数据挖掘理论为基础,对空间孤立点检测及时空孤立点检测等问题进行了深入的研究,将信息熵理论、LLE降维算法引入到空间孤立点及时空孤立点检测研究中,弥补了现有空间孤立点检测及时空孤立点检测算法的不足。提出了一种基于图的空间权重孤立点检测算法。多数空间孤立点检测算法源于传统聚类方法或孤立点检测方法,使用空间对象的空间属性确定空间邻居对象,使用空间对象的非空间属性评价空间对象间的差异,从而发现空间孤立点。这种做法忽略了空间对象的空间属性与非空间属性间的内在联系,没有充分挖掘空间属性对空间对象间差异计算的贡献。本文提出了一种基于图的空间权重孤立点检测算法。通过引进信息熵理论计算空间属性重要因子,为空间邻居分配权重系数的方法,将空间属性和非空间属性结合起来对空间对象间的差异进行评价,并使用基于图的方法检测空间孤立点。该算法充分考虑了空间属性在评价空间对象间差异过程中的作用,解决了在空间孤立点检测过程中将空间属性和非空间属性割裂使用的问题。提出了一种基于改进型LLE的时空孤立点检测算法。时空孤立点检测作为一个较新的研究课题,面临时空邻居界定、算法效率低、传统孤立点检测方法不适用等问题。针对上述问题,本文提出了一种基于改进型LLE的时空孤立点检测算法。首先使用改进的LLE算法将高维时空数据映射为低维数据,其次应用时空异常系数的方法检测时空孤立点。该算法充分考虑了时空对象各种属性的作用,能够有效的将高维数据映射为低维数据,并保持数据的局部拓扑结构不变,从而解决了从高维时空数据集中发现孤立点的难题。设计开发了时空孤立点检测原型系统。针对时空孤立点检测研究和应用的需求,遵循软件工程规范设计开发了时空孤立点检测原型系统,该系统具有较为先进的体系架构、较强的可扩展性和实用性,基本实现了对空间孤立点和时空孤立点的检测分析,并使用真实数据集进行了测试。

【Abstract】 In comparison with fast advances of data collection technology, data mining technology has obviously fallen behind, especially in the field of spatio-temporal outlier detection. Currently, so-called“data explosion”coexists with the“knowledge deficiency”. It is in urgent need to develop spatial data mining technology so as to discover and extract the spatial knowledge from myriad spatial data. As an important branch of spatial data mining, spatio-temporal outlier detection focuses on the approaches to picking out those objects from the available datasets, whose attributes are significantly distinguished from those of other objects in their spatio-temporal neighborhood. In general, these objects could easily be ignored, even canceled as data noise. Detecting spatio-temporal outlier could lead to the discovery of unexpected, interesting, and useful spatio-temporal patterns for further analysis. On the basis of theory of spatial data mining, this dissertation thoroughly studies spatial and spatio-temporal outlier detection issues, by introducing information entropy theory and LLE dimensionality reduction technology into the process of spatial and spatio-temporal outlier detection, which could be used to deal with the shortcoming of existing algorithm.The dissertation proposes a graph-based spatially weighted outlier detection algorithm. Most of the spatial outlier detection algorithms available are developed from traditional cluster methods or outlier detection methods, which use spatial attributes for determining the neighborhood relationship. The computation of the outlierness of a spatial object is solely based on the non-spatial attributes of this object and its neighbors. This traditional approach ignores the intrinsic link between the spatial and non-spatial attributes of spatial objects, which does not fully mine the potential role of spatial attributes in computation of the outlier of a spatial object. Based on this observation, graph-based spatially weighted outlier detection algorithm proposed in this paper, uses both the spatial and non-spatial attributes for evaluation of the differences in spatial objects. In the procedure of computing the outlier of spatial objects, different weights are assigned to different neighbors based on information entropy theory. And then spatial outlier could be detected by means of methods based on graph. The algorithm fully covers the role of spatial attributes in the process of determining differences in spatial objects. So it could be used to solve the problem, such as the separation of spatial and non-spatial attributes in the process of detecting spatial outlier.The paper also puts forward an improved LLE-based spatio-temporal outlier detection algorithm. In study of this relatively new method for spatio-temporal outlier detection, a lot of new issues have to be addressed, including the definition of spatio-temporal neighbor, efficiency of the algorithm, and limitations of traditional outlier detection methods. Firstly, the paper offers the improved LLE algorithm to transform high-dimensional spatio-temporal data to low-dimensional data which can greatly reduce the computation. Secondly, spatio-temporal anomaly coefficient is calculated and used to detect spatio-temporal outlier. This algorithm takes the role of different kinds of attributes of spatio-temporal objects into account, and causes no changes in the local topology of the data. Therefore, it could be used to solve the problems in detecting spatio-temporal outlier from high-dimensional dataset.Finally the paper designs and develops a prototype of the spatio-temporal outlier detection system. This prototypical system is designed for the need of studying and applying based on software engineering principles. The prototype has a relatively advanced framework, stronger extendibility and practical functions. Case studies show that the prototype system has realized the basic task of detecting spatial and spatio-temporal outliers from the true dataset, which proves the efficiency of the algorithm proposed in the dissertation.

节点文献中: