节点文献

引力聚类及其应用研究

The Research of Clustering Based on Gravity and Its Application

【作者】 查丰

【导师】 贾瑞玉;

【作者基本信息】 安徽大学 , 计算机应用技术, 2011, 硕士

【摘要】 数据挖掘是近年来热门的计算机应用技术,聚类是数据挖掘中重要的研究分支。聚类技术是将未分类的样本,通过其相似度进行分类,使得类簇内部样本间相似度最大,而不同类簇间相似度最小,从而发现数据集的特性和内部模式。然而,一些数据集的结构和分布呈现高度复杂性,数据挖掘也为聚类带来了大量亟待解决的问题。因此,聚类分析方法进一步研究的空间还很大。层次聚类方法是一种常用的聚类算法,通过分解目标数据集来创建一个层次。按照层次的分解方向,它分为自下而上(凝聚方法)和自上而下(分裂方法)两种类型。覆盖算法是构造型学习算法,通过找到一组覆盖,使得属于同一类的样本属于同一覆盖,不同类的样本不属于同一覆盖。覆盖聚类算法借鉴覆盖算法的构造性思想,找出一组覆盖,使得属于同一覆盖的样本间距离较小,不同覆盖间的样本间距离较大。我们生活的宇宙,从最初的宇宙大爆炸,宇宙所有的物质都处于混沌状态中,杂乱无章。由于万有引力的作用,使得宇宙中的物质相互吸引、靠近,进而融合形成了星系,恒星,行星等天体。这一过程和数据聚类过程极为相似,都是从最初混沌,通过对混沌中的个体进行某种聚类运算,最终得到结构清晰的聚类结果。正是由于这种相似性,我们把万有引力融入聚类算法中,改进相似度的度量方法,即从单纯的距离作为相似度,到距离与类簇的大小比值作为相似度。本文中研究了层次聚类算法(Hierarchical Clustering, HC)和覆盖聚类算法(Covering clustering algorithm, CCA),在这两个算法中,本文用引力替代距离作为相似度计算公式,提出基于引力的层次聚类算法(Hierarchical Clustering Based on Gravity, HCBG)和基于引力的覆盖聚类算法(Covering Clustering Based on Gravity, CCBG)。实验结果表明以引力作为相似度的聚类结果有一定的改进。客户关系管理(Customer Relationship Management, CRM)将最佳的商业实践与数据挖掘、数据仓库、一对一营销、销售自动化以及其它信息技术紧密结合在一起,为企业的销售、客户服务和决策支持等领域提供了一个业务自动化的解决方案。客户细分是CRM技术中一项重要研究内容,通过对客户的有效分类,采用针对性销售策略,达到销售利润最大化。在客户细分中,最重要的两个步骤是数据挖掘和决策支持,数据挖掘即通过聚类算法找出具有相似行为的客户;决策支持即通过贝叶斯分类、决策树等方法,根据某一客户的个人资料,预测他的行为。本文在数据挖掘过程中采用基于引力的层次聚类算法,并通过朴素贝叶斯分类方法,对客户的行为进行了预测。

【Abstract】 Data mining is the important application of technology in recent years, data clustering is the important branch of data mining. This kind of technology is to separate those not classified samples to some groups by its similarity, making the similarity in one group is bigger and in different groups is smaller, thus, finding the some internal properties and pattern. However, the structure and distribution of some data sets show high complexity, data mining will bring a lot of problems need to be solved for the clustering. Therefore, There is still great space to further study for suan an approach.Hierarchical clustering method is a common clustering algorithm, which create a hierarchy by decomposing given data object sets. Based on the direction of decomposition, Hierarchical clustering can be divided into two methods:bottom-up (condensed) method and top-down (split) method.Cover algorithm is constructive learning algorithm, by finding a group of cover, making the same type of samples belong to the same coverage, different types of samples belong to different coverage. Refer to constructive ideas of Cover algorithm, cover clustering algorithm try to find a group of cover, make the distance smaller in the same cover and the distance larger between different covers.From the initial Big Bang, all matter in the universe is in a chaotic state. As the role of gravity, making the matter in the universe attract each other, and then fuse to form the galaxies, starts, planets and other celestial bodies. This process is very similar to the process of clustering, according to some kinds of cluster computing, the chaos data ultimately become a clear structure of the clustering results. It is this similarity, we improve similarity measurement method by bring gravity into the clustering algorithm, from simple distance as the similarity to the cluster size as one parameter of similarity. This paper research the Hierarchical Clustering Algorithm (HC) and the Covering Clustering Algorithm (CCA), in both algorithms, using gravity in stead of distance as the similarity, propose Hierarchical Clustering Algorithm based on Gravity (HCBG) and Covering Clustering based on Gravity (CCBG). The results show that the gravity as similarity can improve clustering quality.Customer Relationship Management (CRM) is a management philosophy, also is a management software and technology. CRM involves the best business practices, data mining, data warehouse, one to one marketing, sales automation and other information technology. CRM provides a business automation solution that can help company to sale productions and help manager to make decision. Customer segmentation is an important research direction of CRM, by effective classification of customers and targeted marketing strategies, to achieve sales profit maximization. In the customer segmentation, the two most important steps are data mining and decision support, data mining try to find out clustering customers that have the similar behavior; decision support by Bayesian classification, decision tree and other methods, according to customers’personal data to predict his behavior. In this paper, use the HCBG which proposed in the third chapter to do data mining, meanwhile, to predict customer behavior by Bayesian classification methods.

  • 【网络出版投稿人】 安徽大学
  • 【网络出版年期】2012年 04期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络