

ETL System of Building Data Warehouse Research

【作者】 李恒锐

【导师】 王建仁;

【作者基本信息】 西安理工大学 , 管理科学与工程, 2009, 硕士

【摘要】 数据仓库系统整合了企业的信息系统资源,在企业的经营决策中起着越来越重要的作用。如何使得数据高效、低成本地从多种数据源中经过复杂的处理和计算而集成到数据仓库中,成为数据仓库构建中急需解决的问题。ETL系统作为数据源与数据仓库之间的桥梁和纽带,专用于事务型数据到数据仓库数据的处理过程,直接影响数据仓库的建设和运行,因此对于ETL系统的研究和开发成为数据仓库建设必须考虑的重要环节。目前国内的ETL系统基本由国外的专业数据库厂商或数据集成厂商所垄断,使得ETL系统的采购和使用费用昂贵,中小企业用户对于数据仓库使用的需求受到了ETL系统成本的限制。设计并开发适合国内中小企业的ETL系统,以支持企业的数据仓库构建和使用,深化企业的信息系统建设,提升企业的经营管理水平成为国内信息系统发展的方向之一。本文结合构建数据仓库的应用背景,针对信息系统的特性,首先比较研究了传统的ETL系统的特点,提出传统ETL系统在企业信息化建设中的不足之处。主要不足表现在:对于异构性数据源的支持不够充分,ETL系统易使用性不够完善,对用户自身能力要求较高。针对传统ETL系统存在的不足,本文提出了针对性的解决方案:将异构性数据源提炼成为网络中的数据节点,通过数据节点的共性取代数据源异构性,数据库系统对于ETL系统来说都是“即插即用”型的数据资源。数据库系统之间数据类型和系统结构之间的差异则通过数据本身的特点来淡化。将数据资源分为静态数据和动态数据两种类型,通过研究两种数据类型在企业经营过程中产生的方式,以及与数据仓库数据进行点对点的比较,提出了基于数据本身特点的数据获取方式。其次研究分析了数据转换和数据加载过程常见的问题,并提出了解决方案。本文除了对于ETL系统理论上的研究,还将改进方案融入到实践中,设计并实现了性价比较高,不与特定的解决方案集成的通用型ETL系统。最后通过企业应用案例分析,证明了ETL系统确实能够帮助企业提高运营效率,降低运营成本,是企业信息化建设中有力的支持。

【Abstract】 ETL (Extract Transform and Load) dedicates to the treatment process from source data to data warehouse data, like the brige and link between the data sources and dataware house, it directly impact on the data ware house building and running. So if we want to enhance the data warehouse application, we must think much of ETL. At present, ETL tools in China are monopolized by foreign professional database manufacturer or data integration manufacturer. The circs makes the purchase and use cost expensively and limited the data warehouse use requirement of SMEs. So, it will be the one of development directions that design and develop the ETL system to fit the SMEs, support the enterprise building and using data warehouse, deep the enterprise information construction and enhance the enterprise management level. The purpose of this paper is to design and develop a lower cost, higher efficiency generic ETL system, explore the technology to implement the ETL tool. I hope the paper can help SMEs to build data warehouse and OLAP system and then do the deeper information construction.In this paper, I will follow the software engineering principle designing and developing the ETL system. Based on study the exsiting ETL system, I design and develop an ETL system which is developed easy and highly efficient execution. The main research work and results are: analyse the SMEs requirement for ETL tools deep. By understanding and confirming the users’ requirement, divide the ETL system into three business process:data extract, data transform and data load; Using configurable files define and display the business logic; provide the simple interface for helping users understanding and using the ETL system. Use the connection pool and JDBC technology at the data extract and data load process to enhance the database connection stability and security; implement to support the heterogeneous and cross-platform database. At the data transform process, use the combined configurable file to explain the data transform flow, make the data transform process highly flexibility and easier design and modify. Use the ETL system to test, prove that:The program is indeed feasible, the development process is simple and easy to control and the development cost is low. The ETL system is successfully applied in the real environment of multiple users, and obtains the users’praise. It is proved that the ETL system design and development are effective.

  • 【分类号】TP311.13
  • 【被引频次】5
  • 【下载频次】179

