节点文献

基于单元的孤立点算法研究及客户忠诚度分析系统构建

【作者】 孙仁诚

【导师】 邵峰晶;

【作者基本信息】 青岛大学 , 计算机软件与理论, 2003, 硕士

【摘要】 数据挖掘技术是从上个世纪80年代开始发展起来的一门新技术,其主要的目的就是从大量的、不完全的、有噪声的、模糊的、随机的实际应用数据中,提取隐含在其中的、人们事先不知道的、但又是潜在有用的信息和知识。而孤立点分析是数据挖掘中的重要研究方面之一,其作用就是发现数据中的“小模式”,即数据集中显著不同于其它数据的对象。经过近20年的发展,数据挖掘技术在理论研究上日趋成熟,正不断的扩展其应用范围,当前数据挖掘已用于电信、金融、商业、气象预报、DNA、股票市场、入侵检测和客户分类等许多领域。因此,本文首先研究了基于单元的孤立点发现算法,指出了其存在的缺点,并予以改进;其次,采用该算法并结合其它数据挖掘技术,针对企业中的客户忠诚度分析问题构建了客户忠诚度分析系统,最后,根据海尔客户关系数据,分析了海尔集团的客户忠诚度。 第一,阐述了课题的研究背景及其重要的研究意义;从数据挖掘的理论研究和应用研究方面,对当前数据挖掘的国内与国外的研究动态进行分析;通过对知识发现一般过程的分析,给出了一个典型的数据挖掘系统的整体架构,分析了各模块的主要功能,并对其中采用的数据挖掘的技术作了详细阐述。 第二,回顾了孤立点发现的研究过程及当前研究动态,介绍了基于距离、基于密度、基于偏离以及高维数据孤立点发现中的主要算法,具体分析了各个算法的主要内容,在此基础上总结比较了各个算法的优劣及其适用范围。 第三,在基于单元的孤立点发现算法的基础上,提出了一种减少边缘影响的孤立点分析算法。针对算法中边界处孤立点的误判问题,给出了数据空间的单元格划分及数据对象分配方法,定义了数据集边界阈值动态调整函数,提出了基于单元的孤立点挖掘算法的改进算法,在不增加原有算法时间复杂度的前提下,极大地减少了边界处孤立点的误判。并通过实际应用证明了算法的有效性,最终,将该算法用于彩色人脸边缘提取中,取得了极好的应用结果。 第四,完成了客户忠诚度分析系统。首先给出了客户忠诚度的概念,说明了研究客户忠诚度对企业的重要意义;介绍了系统的主要功能:数据预处理、重点客户发现以及客户忠诚类别划分;详细分析了数据预处理模块所采用的预处理手段和方法;给出重点客户发现和客户忠诚类别划分模块中所采用数据挖掘技术(孤立点分析、聚类分析、分类预测分析),对相应技术中所采用的算法进行了详细的描述;最后介绍了用于结果显示的结果可视化模块中的两种方法:平行坐标和分类图表。 第五,通过客户忠诚度分析系统,对海尔公司的客户忠诚度进行了分析。主摘要要针对海尔公司的客户忠诚度分析问题,详细阐述了选取和处理客户忠诚度分析数据的过程及方法,分析了海尔公司中重点客户发现过程及结果,并结合不同的参数,对其结果给出了详细的分析比较,归纳得出了参数变化对重点客户发现的影响规律:此外,还运用聚类分析手段,得出了海尔客户数据中的大致类别,并从这些类别中选取合适的数据对象组成训练集,采用神经网络预测算法给出了海尔客户关系数据的最终的忠诚类别。从而证明了客户忠诚度分析系统的实用性。 最后,对本文的工作进行了总结和对研究前景的展望。

【Abstract】 Data Mining is a new technique developed from 1980s.It aims to extract the implicit, previously unknown, and potentially useful knowledge from voluminous, non-complete, fuzzy, stochastic data. Outliers analysis is a important part of data mining research. Its purpose is to find the "small patterns" from dataset. An outlier is an object that is considerably dissimilar or inconsistent with the remainder of the data. After 20 years of development, on the theory, data mining techniques is becoming more and more consummate and is expanding its application area. Now, data mining has been used in telecom, finance, busyness, weather forecast, DNA, stock market, intrusion detection and customer segmentation etc. So in this paper we first research the algorithm of outlier detection based cell, point out and improve on its shortcomings, then design a system ofcustomer loyalty analysis to settle the customer loyalty analysis problem based this algorithm and some other data mining techniques, final, analyze customer loyalty of Haier company based its customer relationship data.First, we described the background of research and pointed out its significance. The domestic and foreign situation of data mining research was analyzed from theoretical and applying aspects. After analyzing the general progress of knowledge discovery we gave a classic framework of a data mining system, analyzed main function of every module and expatiate on the technique of data mining.Second, The research process and the current situation of outlier detection are reviewed. The algorithms of outlier detection based distance; density, deviation and high dimension are introduced. The content of these algorithms is analyzed. The disadvantages and advantages of these algorithms are compared.Third, Based on the algorithm of based-cell outlier detection, an outlier-analysis algorithm to reduce the boundary influence is presented. The data spatial cell partitioning and data object allocating methods based on the problem of boundary outlier misjudgment in cell outlier mining algorithms are discussed. Then a dynamic adjustment function on dataset boundary threshold is defined and an improved algorithm on the cell-based outlier is brought forward. It can greatly reduce the amount of misjudgment on boundary outlier by the algorithm discussed in this paper without increasing the complexity and the calculating time of the original algorithm. The validity of the new algorithm has been verified by some instances. Finally, we used this algorithm in the edge extraction of colorface images and the effect is satisfying.Forth, a customer loyalty analysis system is designed. The definition of customer loyalty is present. The significance to research the customer loyalty is indicated. Then the functions of this system are explained, which include data preprocessing, key customer finding, customer loyalty partitioning. The preprocessing methods of data preprocessing module are discuss. Data mining techniques used in key customer finding module and customer loyalty partitioning module are given out. The algorithms, which we use in these techniques, are depicted. Finally, result visualization module is introduced) which include tow method: parallel coordinates and categorical chart.Fifth, the customer loyalty of Haier Company is analyzed by the customer loyalty analysis system. In order to analyze the customer loyalty of Haier Company, the process and way, which we choose and deal the analyzed data, is discussed. The results of the key customer finding is analyzed and compared based different parameters. The rule of parameter change and key customer finding is got. Beside these, the classes of Haier customer data are educed by clustering. Then, the proper data objects are chosen as the train set. The final loyalty classes of customer relationship data are got by the prediction algorithm of neural network. The validity of customer loyalty analysis system is verified.Finally, all the results are summarized, and the study prospect is discussed.

  • 【网络出版投稿人】 青岛大学
  • 【网络出版年期】2004年 01期
  • 【分类号】TP311.13
  • 【被引频次】12
  • 【下载频次】319
节点文献中: