节点文献

数据仓库的数据析取技术研究与实现

【作者】 喻小光

【导师】 陈维斌;

【作者基本信息】 华侨大学 , 计算机应用技术, 2002, 硕士

【摘要】 本文描述了一个数据仓库通用数据析取软件的设计与实现。随着社会的进步和科技的发展,分析决策成为了各行各业的生命线。数据仓库技术凭借其在数据存储与组织结构上的优势为决策支持系统提供强有力的数据支持。本软件将来源数据经过集成、转换、清洗、优化后加载到数据仓库中,保障数据仓库拥有高质量的数据,为决策分析系统能有效地工作奠定基础。 本文第一章阐述本课题的意义并对数据仓库技术进行简要分析;二—六章介绍系统设计开发的思路和实现方法;最后一章进行总结和展望。 本软件采用了三层体系结构,使用COM技术和MTS开发和管理中间层组件。我们将数据的集成、转换、清洁、优化等模块都以COM组件形式进行了封装,形成.DLL文件,这样有利于系统的升级、维护和移植。 本文分析了形形色色的数据析取方法,将其归纳为集成、转换、清洁,并提出有必要对数据进行优化,如数据平滑、规范化等,以期更好地支持数据挖掘。 本软件支持对大部分结构化和半结构化数据的析取,包括各种关系数据库,Excel表格,有分隔符的文本文件,XML文件。特别是对XML文件的析取,是本软件特色之一。我们提出了一种基于规则驱动的XML模式数据到关系模式的转换方法,用于完成对XML数据的析取。 系统将用户定义的析取过程封装为析取包(Package),实现一次定义多次使用。为了提高析取包的执行效率,我们采用了微软的DTS作为传输工具,它大大加快了数据析取的速度。

【Abstract】 This paper discusses about the design and implementation of a general Data Extracting Tool of Data Warehouse. With development of the society and improvement of technology, analysis and decision become the lifeline of every walk of life. Data warehouse provides strong data support for analysis and decision by right of its data storied format and data organize structure. This software provides a solution that ensures the Data Warehouse can get high quality data. It gets the raw data from the data source and sends them into the Data Warehouse after integrating, conversing, cleaning and optimizing.The first chapter expatiates the meaning of this work and gives a brief analysis about data warehouse technology. Chapter 2 to 6 introduce the ideal of the system and its implementing method; the last chapter summarizes the paper and vistas the data warehouse technology in the future.This system was designed into three-tier architecture. We use COM technology to develop middle-level components and use MTS to manage them. We packaged the function modules, such as integration, conversion modules, into COM components. It can do good to update, maintain and transplant the system.In this paper we generalize the common data extracting methods into integration, conversion, clean and sum up, and bring forward data optimizing, such as data smoothness, data standardization and so on, in order to support data mining more effectively.This software can extract data from a majority of structurized or semi-structurized data source, such as relational database, Excel file, formatted text file and XML document. Extracting data from XML document is characteristic of the software. We bring forward a XML circumstance based rule driven method to transform XML data to RDB, and in this base implements extracting data from XML document.System can package the extract work defined by user into ’Extract Package’. These packages can be used more than one time. In order to improve the execute efficiency; we adopt MS DTS as our transform tool. It quickens the data extracting speed.

【关键词】 数据析取数据仓库三层体系结构COMXMLDTSDOM
【Key words】 Data WarehouseData ExtractThree-tier architectureCOMXMLDTSDOM
  • 【网络出版投稿人】 华侨大学
  • 【网络出版年期】2002年 02期
  • 【分类号】TP311.131
  • 【被引频次】4
  • 【下载频次】155
节点文献中: 

本文链接的文献网络图示:

本文的引文网络