节点文献

无线传感器网络中数据聚类方法的研究

Research on Data Clustering Methods in Wireless Sensor Networks

【作者】 黄江华

【导师】 张军英;

【作者基本信息】 西安电子科技大学 , 计算机应用技术, 2014, 博士

【摘要】 随着无线通讯技术、微电子技术及嵌入式计算技术的快速发展,无线传感器网络在军事国防,环境监测、交通运输等众多领域中得到广泛开的应用。如何高效的处理无线传感器网络中海量数据,以及如何从中获取有用的知识,成为新的挑战,数据挖掘中的聚类分析是解决这个问题的方法之一。然而,由于传感器节点的资源有限以及传感器节点数据具有时间和空间相关性等特点,传统的数据聚类方法很难直接应用到无线传感器网络中。本文针对无线传感器网络中节点数据的特点,提出了一些新的方法和思路,并将该理论方法应用于无线传感器网络中。主要内容包括以下几个方面:1.针对传感器节点资源有限及节点数据具有位置信息和感知数据的特点,提出了基于网格的分布式双重聚类算法。该算法由两级聚类构成:局部聚类和全局聚类。根据传感器节点的位置和感知数据将数据空间划分成超矩形网格单元;对相邻的网格单元合并构成连通区域,即局部的簇;从局部的簇中抽象出数据特征,将这些数据特征传送到汇聚节点上,进行全局的聚类。该算法通过减少传感器节点单跳通信距离和传送的数据量来降低网络的能量消耗。实验结果表明该算法对无线传感器网络中节点数据具有较好的聚类效果,对数据集的大小具有良好的可伸缩性,能处理大规模的数据集和发现任意形状的簇。2.针对无线传感器节点数据具有位置信息和感知数据的特点,提出了基于模糊C均值的双重聚类算法。该算法在传统模糊C均值聚类算法的基础上插入传感器节点的位置信息,并对隶属度函数进行修正,提高了算法的性能;由于无线传感器网络的动态性,事先很难确定类的数目,采用减法聚类确定类的数目和初始类中心,从而加快了算法的收敛速度以及避免了陷入局部最优。针对无线传感器网络中节点资源有限性,采用分布式聚类,减小了传感器节点的单跳通信距离和数据的传送量,降低了网络中能量消耗。实验结果表明:相对于传统的聚类算法,该算法具有较好的聚类效果并减少了网络中能量的消耗。3.针对传感器网络中相邻节点数据之间存在较强的相关性,提出了基于空间约束的模糊C均值聚类算法。该算法借鉴图像分割的思想,在传统的模糊C均值算法的基础上增加了一个模糊因子,该模糊因子插入了相邻传感器节点的位置信息和感知数据,使聚类结果满足簇内传感器节点在位置上是相近的,感知数据是相似的。该算法克服了模糊C均值聚类算法的不足,提高了算法的性能。实验结果表明该算法对传感器网络中节点数据具有较好的聚类效果。4.针对基于空间约束的模糊C均值聚类算法对类边界处重叠对象分辨率不高,提出基于空间约束的粗糙模糊C均值聚类算法。该算法通过粗糙集上、下近似的引入改变了基于空间约束的模糊C均值算法中隶属度函数的分布情况,修正了类心的更新公式和模糊隶属度计算公式。该算法克服了基于空间约束的模糊C均值算法和粗糙C均值算法存在的不足,降低了计算复杂度,增强了类边界处重叠对象的分辨率。实验结果表明该算法相对于基于空间约束的模糊C均值聚类算法,性能有很好的改善。5.高斯混合模型由于其表达灵活,已成为当前最流行的密度估计与聚类工具之一。由于传感器网络的动态性,事先很难确定高斯混合模型的成分个数;另外,在基于高斯混合模型的数据聚类过程中没有考虑传感器节点的位置信息。针对上述两个问题提出了基于空间信息的高斯混合模型,该模型将传感器节点的位置信息作为模型成分个数的先验知识。在运用期望最大化(EM)算法对该模型进行参数估计过程中,利用先验知识自动确定混合模型的成分个数。实验结果说明:相对于普通高斯混合模型,基于该混合模的EM算法能够精确的确定成分个数,对传感器网络中节点数据具有良好的聚类效果。

【Abstract】 With the rapid development of wireless communication techniques, embedded computing techniques and microelectronics, Wireless Sensor Networks (WSNs) are being widely used in many fields, such as military defense, environment monitoring and transport. How to efficiently deal with huge amounts of sensor data in wireless sensor networks, as well as how to acquire useful knowledge, becomes a new challenge. Clustering analysis in data mining is one of the methods to solve these problems. However, it is difficult to be used directly for traditional data clustering methods in sensor networks due to limited resources on sensor node and sensor data with temporal and spatial correlation. In this thesis, we put forward some new methods and ideas for the characteristics of the sensor data in the wireless sensor network, with the main contents outlined as follows:1. An efficient distributed dual clustering algorithm based on grid is proposed for such characteristics as the limited resources and dual attributes (location informations and sensor data) on sensor node. The proposed algorithm consists of two levels of clustering:local clustering and global clustering. First, data space is divided into hyper-rectangle grid cells according to the locations of sensor nodes and sensor data. Second, adjacent grid cells are merged by sensor nodes being location connected and similar in the same, and the features of local clustering are extracted. Then, these local features are sent to sink where global clustering is obtained based on those features. The proposed algorithm reduces the energy consumption of the network by reducing single-hop communication distance and passing data structures. The experimental results show that the proposed algorithm has a better clustering effect for sensor data, has a good scalability for the size of the data set, and can deal with large-scale data set, and find clusters with arbitrary shapes.2. An efficient dual clustering algorithm based on fuzzy c-means is proposed for dual attributes (location informations and sensor data) on sensor data. The proposed algorithm increases positions information of the sensor nodes into the conventional fuzzy c-means algorithm, modifying membership function of the fuzzy c-means algorithm, and improves the performance of the algorithm. Subtractive clustering algorithm is used to determine the number of classes and the initial clustering center due to being difficult to determine in advance the number of classes, thus speeding up the convergence process of clustering algorithm and to avoid falling into local optimal solution. The distributed clustering is used for resource limits on sensor node, which reducing single-hop communication distance of sensor nodes and passing data structures, thereby reducing network energy consumption. Experimental results show that the algorithm has better clustering effect for sensor data and reduces network energy consumption.3. An efficient fuzzy c-means clustering algorithm based on spatial constraints is proposed for sensor data between adjacent nodes being a strong correlation. The algorithm refers to the idea of image segmentation and incorporates the spatial information of adjacent nodes and sensor data into the conventional fuzzy c-means algorithm in a novel fuzzy way. The clustering results are the process to partition the input sensor data set into several groups in such a way that each group forms a compact region in the geographic domain while being similar in the non-geographic domain. The proposed algorithm can overcome the disadvantages of the known fuzzy c-means algorithms and at the same time enhances the clustering performance of the algorithm. The experimental results show that the algorithm has better clustering effect for sensor data.4. A new rough fuzzy c-means clustering algorithm based on spatial constraints is proposed for being not very good for the fuzzy c-means algorithm to handle overlap of clusters and uncertainty involved in class boundary. The algorithm alters the distribution of fuzzy membership function by combining the lower approximation and upper approximation. Accordingly, the computation of clustering centroid and fuzzy membership is modified. The proposed algorithm can overcome the disadvantages of the known fuzzy c-means algorithms and rough c-means algorithms, reducing the computational complexity, increasing the resolution of boundary overlap. Experimental results show that the performance has a very good improvement with respect to the fuzzy c-means clustering algorithm based on spatial constraints.5. Gaussian mixture model is very popular in density estimation and clustering for its expression and flexible. However, the application of Gaussian mixture model to sensor data clustering faces some difficulties. First, the estimation of the number of components is still an open question. Second, mixture-based data clustering does not consider spatial information of the sensor node, which is important for smooth regions to be obtained in the sensor data clustering results. Gaussian mixture model based on spatial information is proposed. The spatial information is used as a prior knowledge of the number of components. An expectation maximization (EM) based algorithm is developed to estimate these parameters of the proposed model using the prior knowledge of the number of components, and automatically determines the number of components. Experimental results show that the EM-based algorithm that estimates these parameters of the proposed model is capable of estimating the number of components accurately and has better clustering effect for sensor data.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络