节点文献

基于维层次数据立方体存储技术的研究

Research on the Storage Technique of Data Cube Based-on Dimension Hierarchy

【作者】 彭志鹏

【导师】 蒋外文;

【作者基本信息】 中南大学 , 计算机应用技术, 2008, 硕士

【摘要】 数据立方体是数据仓库和联机分析处理的核心概念。为了加速响应联机分析处理系统中的复杂多维查询,通常需要预先计算并保存数据立方体,然而数据立方体的巨大尺寸却给其计算和存储带来诸多难题。因此,降低磁盘空间成本和提高查询性能成为数据立方体研究两个重要却又相互制约的目标。为了从根本上解决这些问题,需要探索有效的数据立方体组织方法。本文首先改进实现了计算维层次数据立方体的ICODH方法,该算法在给定的维顺序下,自底向上逐层递归计算;当具体到某一个维,是从维的粗粒度层到细粒度层方向循环计算聚集;通过共享排序来减少磁盘的读写操作,以减小维层次数据立方体的计算时间。另一方面研究了维层次编码技术,提出了一种对维表能有效进行层次编码的方法,保存了原有数据立方体的语义信息。通过这两方面来加快数据立方体的计算速度,提高其查询性能。浓缩数据立方是一种有效缩小数据立方尺寸的机制,但仍然存在大量的前缀冗余,如小方内的前缀冗余和小方间的前缀冗余。对此,本文扩展实现了一种基于维层次的数据立方组织结构IDHC,它结合基本单元组的浓缩和小方内的前缀共享技术,利用维层次的特点,将具有相同聚集维集(或单值维集)的立方元组聚簇,同一簇内的元组以共享前缀的形式组织来进一步减小立方体的压缩尺寸。同时在物理存储这些元组时为了减小因共享前缀而进行大量元组之间的比较,又提出了批处理生成元组的算法。该算法消除了仅包含单个聚集维的数据小方内元组间的比较,并以批处理模式计算IDHC。

【Abstract】 Data cube is the kernel conception of data warehouse and on-line analytical processing (OLAP). It usually needs to pre-compute and save the data cube in disk in order to promptly answer complex multidimensional queries in the OLAP applications. But the large size brings about a lot of trouble when they are computed and saved. To decrease disk storage cost and improve queries performance are very important but contradictive goals of data cube research. For the sake of resolving these problems, it needs to explore the effective structures of data cube.A new approach named ICODH is improved here, which computes tuples in recursion from bottom to top. When one dimension is computed, it computes from the coarse granularity to the fine granularity in circle. By sharing the sorting costs, it decreases the reading and writing operations of the disk in order to reduce the dimension hierarchy cube’s computation time. On the other side, basing on the research of dimension hierarchy encoding technique, this paper also proposes an effective method to encode the dimension table. This approach preserves the semantic relations by virtue of the compressing mechanism. Through two sides, the data cube speeds up the computation and improves the performance of query.Condensed Data Cube has been proposed as an effective approach for reducing data cube’s size, but there are still lots of prefix-redundancy in the data cube, such as intra-cuboid prefix redundancies and among-cuboid prefix redundancies. For this, a data cube structure named IDHC is extended here. It combines two techniques—BST condensing and intra-cuboid prefix-sharing. According to the character of dimension hierarchy, it clusters the cube tuples which has the same grouping dimension set (or the single dimension set), reduces the size of cube because the tuples in same cluster can share the prefix. Meantime, when these tuples are preserved in disk, the algorithm which tuples generated inthe same batch is proposed in order to eliminate tuple comparisons. Thisapproach eliminates comparisons among tuples in cuboid which containsonly one grouping dimension, and computes IDHC in batch mode wasproposed.

  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2009年 01期
  • 【分类号】TP311.13
  • 【被引频次】2
  • 【下载频次】158
节点文献中: 

本文链接的文献网络图示:

本文的引文网络