节点文献

道路运输信息系统的数据挖掘方法研究与应用

Research and Application on the Data Mining Method of Road Transport Information System

【作者】 郑晓峰

【导师】 徐建闽;

【作者基本信息】 华南理工大学 , 交通信息工程与控制, 2014, 博士

【摘要】 道路运输是我国综合运输最大的组成部分,道路运输信息系统对道路运输管理、服务和行业发展有着重要的意义。道路运输信息系统的数据挖掘是发现和利用道路运输数据内在知识,实现系统深层次应用的关键技术手段。本文从研究道路运输信息系统的模型架构等顶层设计和数据挖掘的需求出发,针对各种数据挖掘理论和方法的优势和不足,在关联规则方法、分类方法、综合优化分类方法、聚类方法等四个方面提出挖掘道路运输中各种知识的适用方法,并在实际应用系统中分别加以验证,最后在广东省道路运输信息系统中综合实现。主要的科研工作与取得的重要研究成果概括如下:一、研究道路运输信息系统的模型架构和数据仓库设计等数据挖掘基础理论,提出了数据类型、数据关系和数据仓库等基本设计,重点介绍典型数据集市例子——IC卡道路运输电子证件系统的设计。二、在对比分析经典关联规则算法Apriori和其优化算法Eclat之间的实质区别基础上,首次提出和证明了候选集以项目为前缀或后缀两种情况下能否剪枝计算的性质,然后结合云计算编程模式MapReduce提出一种更为优化的频繁集计算方法——并行NEclat方法,设计了两段Map函数和Reduce函数,实现剪枝的并行计算,最后用道路运输管理信息系统的车辆投入数据实例进行验证。三、研究分析分类数据挖掘的一般方法——基于距离的分类算法k-最临近方法、决策树和贝叶斯分类方法的优势和不足,分析其在道路运输信息系统数据挖掘的适用范围,提出应用方法,并应用道路运输信息系统中的从业人员管理数据进行实际验证。然后基于全省公交一卡通的应用,建立类似BP神经网络分类方法的跨区消费推算矩阵模型,根据实际应用来设置误差阀值和学习率等关键参数,通过训练实际的一卡通消费数据,得到跨区消费的推算矩阵,最后利用实际测试数据进行验证。四、在研究分类问题的一般描述理论的基础上提出分类数据挖掘问题的抽象模型,引入粗糙集理论来揭示这个模型的本质。然后结合关联规则的Apriori算法和粗糙集理论,分别从条件属性约简、规则的计算和规则的简化等环节提出一系列方法,实现关联知识和分类知识挖掘的优化。首次提出利用粗糙集方法来得到规则条数与支持度、置信度的关系。最后以道路运输信息系统中的质量信誉考核和燃油限值的实例问题来检验这套方法。五、针对典型的基于密度的聚类算法——DBSCAN算法的不足,提出并证明了属性维划分和簇合并原理,最后结合三个原理提出基于MapReduce的优化DBSCAN算法,设计簇合并的Map函数和Reduce函数,实现并行计算,同时对比分析新旧算法的执行效率,并在实际的卫星定位应用例子加以验证。六、从构建广东省道路运输信息系统的业务、应用、数据和技术架构模型出发,重点论述数据类型和特征、数据关系和数据库规划,在此基础上研究全面分析数据挖掘的需求,提出总体解决思路,利用先进的建模分析工具Cognos在广东省道路运输信息系统的卫星定位数据管理子系统综合实现数据挖掘的全过程。

【Abstract】 Road transport is the largest component of China’s comprehensive transportation, theroad transport information system plays an important role on the road transport management,service and industry development. Data mining of road transport information system is thekey technique means to apply the deeply use of system and data. Starting from the study ofroad transport information system model framework, top-level design and data mining needs,for a variety of data mining theory and the advantages and disadvantages, in statisticalanalysis, association rules, classification, classification, clustering optimization five aspectsthe dissertation has proposed mining method for various kinds of knowledge in road transportand verified in the practical application system. Important findings of research work in thisdissertation include:1. This dissertation has researched basic question of data mining of road transportinformation system like model architecture and data warehouse and so on,proposed design ofdata types, data relation and data warehouse, focusing on design and application of typicaldata mart--IC card road transportation electronic certificate system.2. Compared with the conventional association rule algorithm Apriori and itsoptimization algorithm based on difference between Eclat, this dissertation first put forwardand proved two properties on which candidate set can be prune calculated based on prefixproject or suffix project, and then combined with cloud computing programming mode ofMapReduce, put forward a more optimized frequent set calculation method--Parallel NEclat,designed two section of Map function and Reduce function, parallel computing for pruning,finally all method were verified by the management information system of road transportationvehicle input data examples.3. This dissertation studied and analyzed strengths and weaknesses of the general datamining method of the classification. They are k-Nearest Neighbors method, decision treeclassification method and Bias’s classification method. And it analyzed the scope of datamining in road transportation information system, presented the application of methods, andverified it in the actual application of personnel management data of road transportationinformation system. Based on the application of the public traffic pass intelligent card system,the matrix model of consumption cross the district which like a BP neural networkclassification method was built, according to the actual application set the key parameterserror threshold and learning rate, and then through the training of actual card consumer data,obtained matrix of consumption cross the district, finally verified by actual test data. 4. Abstract model was proposed based on the general classification data mining problemdescription of theoretical research on the classification problem, the essence of classificationtheory was revealed by the rough set models. Based on combination of Apriori algorithm ofassociation rules and rough set, a series of methods on attribute reduction, the rule calculationprocess conditions, and the simplified rule were put forward, has optimized the associationand classification knowledge mining. First this dissertation proposed the method on usingrough set to get the relationship of number of rules and the degree of support confidence.Finally, applied quality credibility evaluation and fuel limit value problem instances in roadtransportation information system to test this method.5.Aiming at the typical density based clustering algorithm--DBSCAN algorithmshortcomings, proposed and proved principle of dimension attribute partition and clustering,and finally put forward the optimization of DBSCAN algorithm based on MapReducecombined with the three principles, this dissertation designed Map function and Reducefunction of cluster merging, and applied parallel computing, on the other hand analyzed theefficiency comparison of the algorithm, and verified it in the example of the practicalapplication of satellite positioning.6.Starting from the construction of Guangdong Province road transportation informationsystem business, application, data and technical architecture model, with emphasis on datatypes and features, data relationship and database programming, this dissertation proposed acomprehensive analysis of the demand of data mining, put forward ideas to solve the overall.Finally it applied the whole process of implementation of data mining in the satellitepositioning data management subsystem of GuangdongProvince road transportationinformation system by using advanced model comprehensive analysis tool which name isCognos.

  • 【分类号】TP311.13;U495
  • 【被引频次】1
  • 【下载频次】653
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络