节点文献

远洋船舶调度数据挖掘技术研究与应用

Study on Oceangoing Ship Scheduling Data Mining Technology and Its Application

【作者】 朱飞祥

【导师】 张英俊;

【作者基本信息】 大连海事大学 , 交通信息工程及控制, 2008, 博士

【摘要】 数据挖掘作为知识发现过程中的重要步骤,是从大型数据库及数据仓库中提取未知的、有价值的和可操作性的关系、模式和趋势用于决策支持的过程。随着船岸通信技术及计算机存储设备的快速发展,在航运企业中出现了海量的船舶调度数据,如何充分利用数据挖掘技术来分析隐含在船舶调度数据内部的规律是海上智能运输研究领域中的一个值得关注的问题。本文主要研究数据挖掘技术在远洋船舶调度相关问题中的应用,结合数据挖掘中关联分析、数据约简、决策规则获取等算法特点,着重探讨了在全球港口货物装卸分析、船舶航线货物分析、船舶营运油耗分析中的应用。为了使数据更高效地进行挖掘分析,对船舶调度数据仓库的结构与应用进行探讨与设计,最后与各种数据挖掘应用形成一个船舶调度数据挖掘体系。主要研究内容和取得的研究成果如下:(1)本文通过调研我国船公司的调度业务,建立面向全球港口货物装卸分析、货物流向分析、船舶节能分析等不同主题的船舶调度数据仓库的结构模型,并对其结构、功能及数据存储模型和实现技术进行研究,从而对海量船舶调度数据进行管理与分析,为后续的挖掘算法提供数据支持。随后建立包括数据层、组织层、挖掘层和决策层的船舶调度数据挖掘体系,各层承担着船舶调度数据挖掘不同阶段的任务,从数据预处理、数据挖掘到知识表达,形成了一个完整的体系。(2)针对关联规则挖掘过程中需要多次搜索数据表的问题,分析了粗糙集和关联规则的联系,在单维粗糙集关联算法的启发下,提出了一种基于粗糙集等价类的多维关联算法,将多维频繁项集的求取,转换为多属性的等价类的计算,该算法产生的多维频繁项集只包含用户关心的维度,排除了其他维度的干扰,因而在规则获取方面,更能产生满足用户需求的规则。同时,相比Apriori算法减少了数据库扫描次数,因而提高了算法效率,降低了关联规则的挖掘时间。(3)研究了多维数据关联规则挖掘算法在船舶航线货物分析中的应用问题。远洋船舶货物运输的实质就是货物在时空上的一个转移过程,考虑到船舶在一个港口可能装载多种货物,然后在不同港口分别卸货的实际情况,将货物维数据从事务数据库转换到信息系统,然后运用本文提出的基于粗糙集等价类的多维关联算法分析船舶航线、船型、货物以及时间维之间的关系,得到了航线船型分布、航线货物流向等船公司感兴趣的规则,也验证了本文提出的算法实用性。(4)给出了一种计算正域的改进算法。正域是粗糙集中一个重要的基本概念,依赖度和分类质量的属性约简算法及属性重要度的计算都涉及到正域求解,本文深入分析了正域的定义特点,根据算法中先前的计算结果,及时删除不需要比较的对象,可以大大降低后续计算中物标对的组合数,从而减少计算量,提高计算效率。利用来自UCI(University of California Irvine)的机器学习数据集测试,结果证明该算法相比经典的正域求取算法,效率明显提高,针对大数据集效率提升更为明显。(5)众所周知,求所有最小属性的约简是NP问题,本文提出一种以属性多样性为启发条件的基于分类能力的启发式算法,简化了启发式条件,用分类能力计算替换正域计算,相比基于正域的属性约简算法,提高了算法的效率。利用来自UCI的机器学习数据集测试,结果证明该算法相比经典的正域求取算法,效率有明显提高。(6)船舶营运油耗是一个受多因素影响的综合性过程,需要对船舶营运中油耗因素展开分析。然而在实际调度报文中,船舶营运油耗的某些属性的属性值存在遗失,是不完备的,因此本文首先将营运油耗数据的属性值完备化,然后利用计算正域改进算法确定船舶营运过程中油耗的主要因素,利用粗糙集属性约简算法对油耗属性进行约简,从而获得有意义的决策规则,为船舶营运过程制定合理节能措施提供理论依据。最后,对全文进行了总结,并对有待进一步研究的问题进行了展望。

【Abstract】 As an important step in the knowledge discovery, data mining is the process of extracting unknown, valuable and workable relationship, patterns and trends from the large-scale database and data warehouse for decision-making supporting. As the rapid development of the ship-shore communications technology and computer storage devices , ship scheduling data emerges in shipping enterprises, so how to make full use of data mining technology to analyze the implicit rule from the ship scheduling data is one concerned issue of intelligent transportation on sea. Combining with the characteristics of association analysis, data reduction, acquisition of decision-making rules, etc., this dissertation mainly researches on the application of data mining technology on ocean-going vessels scheduling problems, and discusses the application of those methods to the analysis of global port cargo handling, analysis of cargo shipping routes, analysis of fuel consumption for ship sailing. For a more effiective data mining analysis, this dissertation designs and implements the ship scheduling data warehouse, demonstrates its applications. Ship scheduling data warehouse and its applications are integrated into the ship scheduling system. The research contents and results of this dissertation as follows:(1) Through the study of China’s shipping companies scheduling operations, this dissertation establishes the differenet themes in ship sheduling data warehouse, including theme of global port cargo handling, theme of cargo flows and voyage, theme of energy-saving,etc. The structure, model, function, data storage model and realization of the ship scheduling data warehouse are all studied to manage and analyze the massive ship scheduling data, which offers data support for follow-up data mining algorithms. Subsequently, the ship scheduling data mining system, which includes data layer, organization layer, mining layer and decision layer, is established. Different layer has its own functions, from data pre-process, data mining to knowledge expression, under differenet stages of data mining task to formulate a whole system. (2) Aim at resolving the problem of repeatedly accessing the data table for mining association rule, this dissertation analyses the relation between rough set and association rule, then proposes a multi-dimensional association algorithm based on equivalent category in rough set. In this algorithm, the computing of multi-dimensional frequent items is converted to computing of equivalent category with multi-attributes. So, the number and content of multi-dimensional frequent items and association rules produced by this algorithm are limited by interesting dimensions which are assigned by uesr. Compared with Apriori algorithm, this algoritm reduces the number of accessing and scaning database. So this algorithm decreases the time of computing association rules and is efficient.(3) This dissertation researches on the application of the multi-dimensional data mining association rules algorithm in the analysis of cargo flow and ship routes. The essence of oceangoing ship transportation is the changes of cargoes position under time and space dimensions. In a voyage, ship may load many kinds of cargo at the same port and discharge those cargoes in different ports. Concernd this, the cargo dimension data is pre-processed and converted to information system. Then, the interesting rules concerned ship type-ship route and cargo category-ship route are obtained by the multi-dimensional data mining association rules algorithm proposed by this dissertation, which is applied to research the relations of ship-route,ship-type,cargo and time.(4) Positive region is a key concept in rough set and plays an important role in calculating the dependency degree of attributes, the ability of classification and the significance of attributes. A new improved algorithm of calcualting positive region is proposed by this dissertation. The new algorithm deletes the compared objects timely and cuts down the combinations of object pairs for next computing. Experiments on data sets from UCI show that the new algorithm on attribute reduction is more efficient than classical algorithm of calculating positive region, especially on large data sets.(5) It is well known that finding the shortest reduct is NP hard. In this dissertation, a novel heuristic algorithm based on the ability of classification is proposed for attribute reduction. In the new algorithm, cardinality attributes is used as the heuristic. Compared with the positive region calculating algorithm, the new algorithm calculates the ability of classification, instead of generating positive region. Experiments on data sets from UCI show that the new algorithm is more efficient on attribute reduction in decision information system.(6) The process of fuel consumption for ship sailing is complicated and easily influenced by many factors. In fact, some attributes of fuel consumption miss values. In this dissertation, incomplete fuel consumption information system is firstly transformed into complete information system. In order to get the valuable decision rules and support the decision-making on energy saving, the new improved alogorithm of calculating positive region is used to computing the significance of the differnet fuel consumption factors and attribute reduction algorithm is used to compute the redcut of fuel consumption factors.Finally, the conclusion is made, and the problems for further study are reviewed.

节点文献中: