节点文献

一种启发式贝叶斯分类算法及其在铁路货运客户细分中的应用研究

Research of a Heuristic Bayesian Classification Algorithm and Its Application in Railway Freight Customer Segmentation

【作者】 郭雨松

【导师】 钟雁;

【作者基本信息】 北京交通大学 , 系统分析与集成, 2008, 硕士

【摘要】 数据挖掘是从大量数据中发现潜在规律、提取有用知识的方法和技术。近年来,数据挖掘受到了普遍关注,已经成为信息系统和计算机科学领域中的研究热点之一。作为数据挖掘中的一种分类算法,贝叶斯网络是用来表示变量间连接概率的图形模型,它提供了一种自然的表示因果信息的方法,是目前不确定知识表示和推理领域中最有效的理论模型之一,在机器学习算法的设计和分析方面扮演着越来越重要的角色。本文全面介绍了贝叶斯网络的研究现状,重点分析了贝叶斯分类器的理论基础以及三种经典的贝叶斯分类器:朴素贝叶斯分类器、贝叶斯网络分类器和TAN分类器。在此基础上提出了一种启发式贝叶斯分类算法,该算法结合了K2搜索算法和TAN分类器的优点,并在一定程度上弥补了两者的不足。在TAN分类器构建最大权重跨度树的过程中确定出边的次序,再依据一定的规则为节点排序,最后由K2搜索算法构建贝叶斯网络结构。实验结果表明,该启发式贝叶斯分类算法的网络结构更加合理,分类准确度更高。鉴于数据挖掘技术在客户关系管理中日益广泛的应用,本文还提出了一种铁路货运客户细分方案,即利用数据挖掘中的聚类和分类技术对铁路货票库中的海量数据所蕴藏的信息进行挖掘,首先利用聚类方法对货运历史数据进行聚类分析,然后依据聚类结果用贝叶斯分类器对新客户分类。该客户细分方法可以为铁路货运营销部门提供决策依据,从而提高铁路企业的客户关系管理和决策水平。另外,本文在深入研究贝叶斯分类算法的基础上,并结合铁路货运客户细分的实际需要,开发了一个贝叶斯分类算法软件平台,可以作为一个通用的数据挖掘平台应用于相关领域。

【Abstract】 Data mining is a technology which can discover underlying rules and extract useful knowledge. In recent years, data mining has attracted widely attention and became one of hotspots in the research of information system and computer science.As a classification algorithm in data mining, Bayesian network is a graphical model which can express the probabilities between the variables. It is one of the most effective models in the field of uncertain knowledge and is playing an important role in design and analysis aspects of machine learning algorithms.The research environment of Bayesian network is roundly introduced in this paper, furthermore, the theoretical foundation of Bayesian classifier and three classical Bayesian classification algorithms are typically analyzed, which are naive Bayes classifier, Bayesian network classifier and TAN classifier. Based on this, a heuristic Bayesian classification algorithm is proposed which combines the merits of K2 algorithm and TAN classifier, and gets rid of their defects. Edge order can be fixed in the procedure of constructing maximum weighted spanning tree in TAN, and then nodes order can be fixed according to certain rules, at last K2 algorithm is used to construct Bayesian network. The experiment result shows that network structure of this algorithm is more reasonable and it has higher classification precision.Based on the broadly application of data mining in CRM, a scheme of railway freight customer segmentation is also proposed in this paper, that is, using the clustering and classification of data mining to mine the information hided in the mass data of railway waybill database. First, the historical freight data is analyzed with clustering method, and then the new customer can be classified with Bayesian classifier according to the previous result. This customer segmentation method could support the marketing department’s decision-making and improve the CRM level of railway enterprise.In addition, based on the in-depth research on Bayesian classification, Bayesian algorithms software is developed in need of railway freight customer segmentation, and as a universal data mining platform, it could be applied in relative fields.

  • 【分类号】TP18;U294.1
  • 【被引频次】7
  • 【下载频次】313
节点文献中: 

本文链接的文献网络图示:

本文的引文网络