节点文献

基于密度的聚类算法研究及其在电信客户细分中的应用

A Clustering Algorithm Based on Density with Its Application in the Customer Cluster in the Field of Telecom

【作者】 陈园园

【导师】 陈治平;

【作者基本信息】 湖南大学 , 计算机软件与理论, 2008, 硕士

【摘要】 伴随着电信市场的迅速发展,电信客户逐渐呈现出细分化、多元化的特征,电信企业的竞争焦点和发展机遇将更多的集中到各细分市场中。运营商要保持市场的领先地位以及不断提升客户价值,必须主动进行客户细分。因此如何有效地利用数据挖掘方法对客户进行细分是目前数据挖掘应用的一个非常热门且具有重要应用价值的研究课题。论文对数据挖掘基本方法之一的聚类技术进行了较全面的比较研究,并利用改进的聚类算法来细分电信业客户,从而达到可识别具有相似特征的客户群,成为分析客户和形成市场策略的基础。本文主要研究工作与特色有:1)针对基于密度的聚类方法不能发现密度分布不均的数据样本的缺陷,提出了一种基于代表点和点密度的聚类算法(CBRD)。算法以代表点的平均密度作为类密度,代表点的k近邻为代表区域,根据类密度,将满足密度阈值的代表区域中的点选为代表点,再利用选出的代表点调整类密度,如此反复的寻找出所有代表点和代表区域。所有区域相连的代表点及其代表区域将构成一个聚类,不在任何一类中的点则被作为噪声数据。实验结果显示,该方法可以发现任意形状的密度分布不均的类。2)提出的CBRD算法虽然能够发现任意形状的聚类,但是在数据量大的时候需要较多的内存和I/O消耗,导致其在客户细分中不能取得好的应用。因此,在CBRD聚类算法思想的基础上,本文提出了一种基于数据交叠分区的高效密度聚类算法,算法继承了CBRD聚类算法可以发现任意形状的密度分布不均的类的优点,同时还具有较高的运行效率。3)将改进后的密度聚类算法应用于电信客户细分,可以使企业更好的掌握市场动态以及对潜在客户挖掘提供有力的技术支持。实验结果证实了该聚类算法的有效性。

【Abstract】 Along with the rapid development of the telecommunications market, telecommunications services are popular among consumers, and telecommunications is gradually showing features of subdivision and diversified. To maintain the leadership position in the market and continuously upgrade customer value, operators must take the initiative to conduct customer segmentation. So how to make effective use of data mining methods on customer segment has a very popular and important application value of the research topic of data mining application.This paper carries a comprehensive comparative study on clustering technology, one of the basic method of data mining, and uses the improved clustering algorithm to segment the telecommunication customers with the purpose of identifying the customers having similar characteristics and being the base foundation of market analysis and market strategy formation.In this paper, the researches and characteristics are:1) Aimed to solve the problem that the density-based clustering algorithm dose not work well when data distribution is not even, a new clustering algorithm based on representatives and point density is provided. The algorithm sets the cluster density with the average density of representative points and sets the k neighbors of representative points as representative region. Under the density of cluster, the points in the representative region which meets the density threshold will be selected as representative point and be reused to adjust the density of cluster. And so repeatedly find out all the representative points and regions. All the region-linked representative points and regions will form a cluster, and any points in no clusters are noises. The experimental results revealed that the algorithm can find any shape and uneven distribution of the density of clusters.2) Although the CBRD algorithm can detect any shape of clusters, but it needs lots of memory and I/O consumption when the data has a large amount, resulting no good application in customer clustering. Therefore, based on the CBRD algorithm, an efficient clustering algorithm based on data overlap is carried in this paper. This algorithm inherits the CBRD clustering algorithm and can find any shape uneven distribution of the density of clusters, also has a high operating efficiency.3) Successfully to applicant the improved density clustering algorithm in telecommunications customer clustering, so that enterprises can better grasp of market dynamics and give effective technical support for mining the potential customers. The experimental result confirms the validity of the clustering algorithm.

【关键词】 数据挖掘客户细分聚类密度交叠分区
【Key words】 Data MiningCustomer ClusterClusterDensityOverlapping Division
  • 【网络出版投稿人】 湖南大学
  • 【网络出版年期】2008年 12期
  • 【分类号】TP311.13
  • 【被引频次】4
  • 【下载频次】350
节点文献中: 

本文链接的文献网络图示:

本文的引文网络