节点文献

实时数据仓库环境中分区技术的研究与应用

Research and Application of the Partition Technology in Real-Time Data Warehouse

【作者】 赵宏博

【导师】 王大玲;

【作者基本信息】 东北大学 , 计算机软件与理论, 2008, 硕士

【摘要】 随着企业对数据的实时性的要求的提高,传统的数据仓库技术已无法满足实时性需求。实时数据仓库技术的出现,为企业或组织提供了实时或近实时的数据信息。由于数据实时性的提高,每天都有海量的数据被保存到计算机中,如何在实时数据仓库环境中高效地管理数据的问题也暴露出来。基于数据库的管理技术实现高效的数据管理是解决这一问题的途径之一。本文针对实时数据仓库的特点,以提高实时数据仓库的存储和查询效率为目标,研究了分区技术对实时数据仓库中查询效率的贡献,以及对增量数据的分区存储的问题。从传统的数据库模式特点出发,提出了基于分区表的分区模型和系统化建模流程,包括模型建立、模式抽取和数据迁移等,并从理论依据、实验依据、通用性和扩展性等几个角度详细的论证了此分区表的优越性。针对实时数据仓库的特点,.提出了改进的水平分区算法和基于服务器端改进算法,并详细论证了此方法对查询效率的提升。在此基础上,设计了以上述分区技术为核心的分区引擎,并且将其应于到国家海洋环境数据仓库的开发过程。实验表明,本文所提出的基于分区表的分区策略以及动态分区算法在数据存储和查询领域具有优越的性能,达到了预期的目标。

【Abstract】 With the increasing of real-time processing data requirements in enterprises, the traditional data warehouse technique has been unable to meet the requirements. The appearance of real-time data warehouse technique provides real-time or near real-time data information for enterprises and organizations. As the improvement of real-time data, the daily mass of data is stored in the computer, so the problem how to efficiently manage the data in data warehouse is emerged. It is one of the approaches solving the problem to implement the efficient data management based on database management techniques.In this thesis, for the characteristics of real-time data warehouse and the purpose improving storage and query effectivity of data warehouses, the contribution of partition technique for storage and query effectivity of real-time data warehouses and the problem of partition storage for increment data are studied. From the traditional characteristics of the database model, the partition model based on zoning district table and systematic modeling processes are proposed, including model creation, the mode extraction and data migration. The advantage of the partition table is proven from the theoretical basis, experimental basis, universal and expansibility. Aim at the characteristics of real-time data warehouse, an improved algorithm of horizontal partition and an improved algorithm based on server-side are proposed, and the improvement for query effectivity is discussed. Moreover, the partition engine based on above partition techniques is designed and utilized in the development for data warehouse in national marine environment.The experiments show that partition strategy based on partition table and the dynamic partition algorithm in data storage and query had superior performance and achieved the expected goals.

  • 【网络出版投稿人】 东北大学
  • 【网络出版年期】2012年 03期
节点文献中: