节点文献

面向列车客票数据预测分析及特征提取方法的研究

【作者】 吕晓艳

【导师】 叶阳东;

【作者基本信息】 郑州大学 , 计算机软件与理论, 2004, 硕士

【摘要】 随着铁路信息化技术的发展,作为铁路信息系统子系统的客票营销系统已经积累了丰富的数据,如何以较少的人力和技术成本合理利用现有的客票信息资源获取有价值的决策信息,日趋成为铁路决策部门的一个迫切需求和铁路客票营销和信息技术部门的一个工作重点。数据挖掘技术的迅速发展为铁路客票营销工作的深入分析奠定了良好的理论基础,但是现有的数据挖掘工具在面对海量存储级别的客票数据和结合铁路背景的实际应用需求时,具有一定的局限性,不能直接为其所用,需要结合应用需求进行方法改良。 本文面向铁路客票的营销需求分析,以铁路客运为背景,针对客票数据特征,围绕如何对铁路客票数据建立有效的数据分析模型进行了深入的研究和大量的应用性实验。本文是以数据挖掘分类方法中的决策树归纳方法和数据挖掘中的概念描述为理论出发点,以建立合理的面向客票数据的数据分析方法为目的的。对于不同的决策树分类算法,特别是对ID3、SLIQ、SPRINT等进行了较为详尽、深入地研究,通过详细的分析和综合研究,针对目前铁路客票营销系统中预测方法的不足,提出了一个改进的决策树方法TTDTPA。此方法具有突破内存的限制、可提取的定量规则以描述主类分布、易于实现并行等特点,从而使得经过改进的决策树分类方法TTDTPA可以更有效地满足铁路客运营销分析的需求。同时,本研究还尝试采用了朴素贝叶斯方法和一种基于等价类划分方法对客票数据分别进行建模,以期能改善对客票数据的分析的综合性能。特别是后一种方法,它可以提取数据集中小类属数据的特征,从而有效的弥补了TTDTPA方法在此方面的局限。通过对这些方法实际应用结果的归纳分析,根据它们不同的特点,在本文最后给出了对实际客票数据进行数据分析时建立数据分析模型的方法。 通过研究,我们对挖掘技术在客票数据中的应用有了一定的积累,为进一步的研究奠定了良好的基础并提供了一定的理论指导。另一方面,将有效的数据挖掘技术应用于铁路客票营销分析,建立合理的预测分析模型,为铁路部门合理安排运能、科学组织管理提供了准确的决策信息和先进的预测手段。

【Abstract】 With the development of information technology in China railway, rich ticket data have been collected in China Railway Train Ticket System (CRTTS), which is the subsystem of China Railway information system. How to efficiently extract the valuable decision information from the huge ticket data sea with the lower human and technique expenditure is becoming the urgent request for the decision department of Railway and has been the key point for the information department of Railway. It is the techniques about data mining developed rapidly that establish the stable theoretical footstone for the further research on the railway ticketing analysis, but there are some limitations existed in present data mining methods when they are applied to the huge datasets with the railway background. So, the generic methods must be improved to fit the application needs.Regarding the railway passenger traffic as our study background and analyzing around the train ticketing requirements, we do deeply research and make lots of application experiments on how to build the efficient data analysis model on ticket dataset in CRTTS. The methods of Decision Tree Induction and Concept Description in data mining are the theoretical point which we begin our study, and this research aims at building rational and efficient models to analyze train datasets. Firstly, after detailedly, deeply analyzed and studied on current classification algorithms, especially, such as on ID3, SLIQ, SPRINT, and according to the requirements of decision analyses and the limitations of current prediction methods in CRTTS, a new method TTDTPA, which is based on decision tree induction, is presented. TTDTPA has the characteristic to break the memory restriction, can extract a kind of instructive rules that collect the advantages both prediction and statistic, and is fascile to implement the parallel algorithm. Therefore it is suitable for supporting multi-level requirements of the decision-makers for predictive analysis in CRTTS. Secondly, for improving the integrated analysis, this research also try to take other two data analysis methods to analyze the train ticket data. One is the naive bayesian, and the other is a new method based on the indiscernibility relation. The application experiments had proved that the latter method has efficient ability to extract the data characteristic of the minority kinds in main class, which just in time to make up the TTDTPA’s limitation on this side. And then according to the induction analysis based on these methods and considering the application background, the instructive method that is used to building the analysis model on the train ticket data is been given at the end part of this paper.This study makes an efficient exploration in the application fields of data mining techniques and provides a favorable groundwork to make further researches on data analysis in CRTTS. And the improved methods have the ability to build an efficient predictive model to help decision maker to know the railway transportation situations well, get the multi-aspect, multi-level analyses for train ticket data.

  • 【网络出版投稿人】 郑州大学
  • 【网络出版年期】2004年 04期
  • 【分类号】TP399
  • 【被引频次】5
  • 【下载频次】297
节点文献中: 

本文链接的文献网络图示:

本文的引文网络