节点文献

基于属性扩展图的K-means聚类算法的研究

Research on the K-means Clustering Algorithm Based on Attribute Augumented Graph

【作者】 任巍英

【导师】 高媛;

【作者基本信息】 中北大学 , 计算机软件与理论, 2012, 硕士

【摘要】 社团结构是社会网络普遍存在的拓扑特性之一,发现社会网络中的社团结构是复杂网络研究的基础性问题。聚类算法是发现社团结构的一种重要的方法。聚类分析技术在过去的许多年中得到了广泛的研究,其中K-means聚类算法是众多聚类算法中比较经典的一个。K-means聚类算法由于思想简单、时间复杂度小而被广泛的进行了研究与运用,尤其在对大规模数据进行挖掘中,K-means聚类算法具有高效性及可伸缩性。真实网络中,除了数据之间存在的拓扑结构以外,其数据本身存在着各种特殊属性。现存的许多聚类算法仅仅依靠数据间的拓扑结构进行聚类,而很大程度的忽略了数据所具有的特有属性在聚类分析中的作用。本文在分析聚类算法中节点的拓扑结构及特有属性的作用之后,对K-means聚类算法进行改进,提出了一种新的聚类算法-SAK聚类算法。本文的主要研究成果如下:(1)将真实网络用图的模型表示,并根据现实网络的实用性,将节点的属性特性作为节点添加到图中,并根据节点与属性的关系添加相应的边,从而构成属性扩展图。在属性扩展图的基础上,使用随机行走模型对节点的结构及属性相似性进行统一的测量。(2)提出了一种自动更新权重值的方法,在聚类算法不断迭代的过程中,节点边的权重会随之发生变化,随着权重的改变节点间的相似度也会随之改变,这样,不同的属性将会在聚类算法中起到不同的作用。这种改变将会使节点间相似度的测量更加趋于实际,趋于准确。(3)提出基于属性扩展图的K-means聚类算法(SAK),该算法改变K-means算法随机选取初始聚类中心的方法,采用密度函数的方法进行初始聚类中心的选取。并运用两个真实的社会网络对本文所提出的SAK聚类算法进行了验证。

【Abstract】 Community structure is one of the common topological characteristics of social networks.Community detection has become a fundamental problem in the research field of socialnetworks. The clustering algorithm is an important way to find community structure. Theclustering analysis technique has been widely studied in the past many years, one of theclustering algorithm is k-menas clustering algorithm, which is the most classical. Because ofsimple thinking and short time complexity , k-means is wide studied and applied. Especiallythe k-means algorithm is scalable and high efficient in large-scale database.In the real network, there is topological structure between datas, and there are specialproperties too. Many existing clustering methods mainly focus on the topological structure,but largely ignore the data properties, which are often important in the clustering algorithm.The topological structure and properties of data play an important role on the clusteringanalysis.In this paper, we analyze the important of topological structure and properties in theclustering algorithm, then impore the K-means clustering algorithm. We propose a newclustering algorithm (SAK). The main contributions of this paper are summarized below:(1) We use graph model to said real network. In accordance with the practicality of thereal network, we add the properties characteristic of the node as a node in the graph, and addthe edges in accordance with the relationship between the nodes and the attribute. Then theattribute augment graph is constituted. A random walk model is used to measure the vertexcloseness through the topological structure and properties based on the attribute augmentgraph.(2)We propose a weight self-adjustment method. In the iterative process of clusteringalgorithm, the weight of edges and the vertex closeness is updated too. The different attriburewill play different role in the clustering algorithm. The change will make the measurement ofsimilarity become more practical and accurate. (3) The k-means clustering algorithm based on the attribute augument graph is putforward. The clustering algorithm change the method of randomly selecting the initialclustering centers on the k-means clustering algorithm. The density function method is used toselect the initial clustering centers. We use two real network to verify the clustering algorithm.

  • 【网络出版投稿人】 中北大学
  • 【网络出版年期】2012年 08期
节点文献中: