节点文献

模糊、动态多维数据建模理论与方法研究

Theory and Methods of Fuzzy, Dynamic Multi-dimensional Data Modeling

【作者】 刘青宝

【导师】 张维明; 邓苏;

【作者基本信息】 国防科学技术大学 , 管理科学与工程, 2006, 博士

【摘要】 多维数据模型的研究为数据仓库技术与OLAP技术的广泛使用提供了基础支撑,具有重要的理论与实践价值。在多维数据模型中,维是一个非常重要的概念,由于其具有一定的层次结构,允许人们用不同的粒度对所关心的事实进行分析。现有的多维数据模型中,维的层次结构建立在完全划分的基础上,具有层次清晰、结构稳定的特性。但现实世界中,描述客观事物的信息往往是不确定、模糊的,而且客观事物本身又是动态演变的,从而难以基于静态的、界线分明的完全划分建立那种层次清晰、结构稳定的分析维模型。为此,本论文以模糊、动态条件下多维数据建模为研究目的,提出支持模糊维的多维数据模型和基于聚类的模糊维构建方法;提出连续数据流的多层次滑窗模型,设计连续数据流的在线聚集算法;提出数据流动态多维数据模型及其在线多维聚集方法。论文的主要工作和创新体现为以下四点:1.基于模糊商空间理论,提出了模糊多维数据模型通过引入模糊等价关系,提出了一种支持非完全划分的模糊维结构模型。与普通维相比,本文提出的模糊维主要在两个方面进行了扩展:一是扩展了两个维级别间的元素聚集关系?λ,支持依λ参数的元素聚集操作;二是在级别内部建立了元素聚集关系λ,支持级别内分层递阶结构上的元素聚集操作。而且这种扩展具有兼容性,即普通维可以作为模糊维的一个特例。在模糊维的基础上,论文给出了模糊多维数据模型、模糊数据立方体、聚集操作,以及上钻、下钻、选择、投影、切片和切块等基本OLAP分析操作的形式化描述。采用模糊粒度计算理论与方法对模糊聚集问题进行了深入的分析,提出了三种处理方法:保守法、乐观法和元素导出集法。与有关多维数据模型相比,本文提出的模糊多维数据模型突破了传统多维数据建模理论的局限,对非确定性、模糊多维数据分析问题,具有较强的描述与建模能力。2.提出了基于聚类的模糊维构建方法针对模糊等价关系难以确定的实际问题,论文根据对象集合的规模大小,分别提出了基于模糊聚类的模糊维构建方法和基于相对密度聚类的模糊维构建方法;同时,提出了基于相对密度的聚类算法,该算法能在不同参数下得到比较稳定的聚类结果,即聚类结果对参数设置不过于敏感,而且高密度的类簇能从相连的低密度的类簇中识别出来,从而可得到多密度分辨率的聚类结果。3.提出了数据流多层次窗口模型和在线聚集算法在数据流处理过程中,一般对最近时段的信息要求比较详细,而对较远时段的信息往往只需概貌。为此,论文提出了一种多层次时间窗口模型,能支持在不同时段对数据流进行不同时间粒度的建模;设计了多粒度聚集树结构和过期数据的金字塔快照存储结构;提出了数据流在线聚集与近似查询算法,通过性能分析可知,无论在存储空间还是处理时间上都能满足数据流在线聚集与查询分析的苛刻要求,从而有效地解决了有限时空条件下的数据流聚集与查询问题。4.提出了数据流动态多维数据模型及其在线多维聚集方法基于多层次时间窗口模型的时间维模式,提出了数据流动态多维数据模型。与一般数据仓库的多维数据模型相比,数据流动态多维数据模型的突出优点在于能支持时间维的跨度无限性和数据集的动态变化性。数据流时间维的跨度无限性决定了任何存储系统都难以保存整个时间域的所有数据粒子,因此,多层次时间窗口模型是数据流时间维建模的必然选择;而数据集变化的快速性和持续性决定了数据流多维数据模型应支持在线的多维聚集。由于数据流观测属性的表征性、细节性和技术性等特征,使得数据流多维联机分析处理中的维度选择与构建十分困难。论文提出了支持数据流维度动态建模的在线聚类算法;设计了支持数据流在线聚类与多维聚集的数据结构;提出了数据流基本单元的在线聚集物化方法。论文在模糊、动态多维数据建模理论和方法方面的研究,对于促进数据仓库技术、OLAP技术和数据挖掘技术的紧密集成和广泛应用具有一定的理论和实践意义。

【Abstract】 As an underlying technical foundation which enriched the applications of data warehouse and OLAP techniques, the study of multi-dimensional data model has been acknowledged for its important theoretical and practical value. Dimension, as defined in the multi-dimensional data model, is a very important concept because of its hierarchical structure which allows people to analyze the facts concerned from different granularities. In the existing multi-dimensional data models, the hierarchical structure of dimension is often based on complete partition with clear hierarchy and stable structure. On the other hand, the information which describes a real world object is often incomplete and fuzzy, and the objects may possibly be dynamic and evolutional, thereby it is difficult to build the corresponding analytic dimensional model with clear hierarchy and stable structure. With multi-dimensional data modeling under fuzzy and dynamic conditions as the research goal, this thesis proposed a multi-dimensional data model which supports fuzzy dimension with the corresponding clustering-based dimension construction method, puts forward an online aggregation algorithm by studying the hierarchical sliding window model for continuous data stream, and presents an dynamic multi-dimensional data model for data stream with the relevant online multi-dimensional aggregation algorithm. The main contributions and innovations of this thesis are:1. Proposes the fuzzy multi-dimensional data model based on fuzzy quotient space theory.A fuzzy dimension structural model which supports incomplete partition is obtained by introducing in the fuzzy equivalence relation. The fuzzy dimension proposed here has extended the ordinary concept of dimension mainly in two aspects: firstly, it extends the aggregative relation ?λbetween two dimensional levels, and supports the parametric aggregation operation based onλ; secondly, it establishes the aggregative relationλwithin a level, and supports stepwise hierarchical aggregation. This kind of extension is also comprehensive, i.e., ordinary dimension can be taken as a special case of fuzzy dimension.Formal descriptions of the fuzzy multi-dimensional data model, fuzzy data cube, and some elemental OLAP operations such as drilling up, drilling down, selection, projection, slicing etc, is also presented in this thesis based on the concept of fuzzy dimension.Through an in-depth analysis of the imprecise aggregation problem using theories and methods in fuzzy granular computing, three processing methods which are conservative method, optimism method, and element-derived set method, are proposed. Compared with other related works, the presented fuzzy multi-dimensional data model, which is based on the solid ground of fuzzy quotient space theory, breaks the limitations of traditional multi-dimensional data modeling theory, strengthens the capabilities of description and modeling for uncertain and fuzzy multi-dimensional data analysis.2. Puts forward the clustering-based construction method of fuzzy dimension.To overcome the difficulties of determining the fuzzy equivalence relation, this thesis proposes two approaches for fuzzy dimension construction accord to different scales of the objects set: method based on fuzzy clustering and method on relative density clustering. Meanwhile, clustering algorithm based on relative density is also proposed, which can produce relatively stable clustering result under different parameters, or to say, the clustering results are not be too sensitive to the parameters. High-density clusters can also be identified from the connected low-density clusters, and thus the clustering results of multi-density can be gained.3. Proposes the multi-level sliding window model of data stream and the online aggregation algorithm.Generally in the processing of data stream, more detailed information is needed on the recent period of time than that from time interval far away. From this point of view, a multi-hierarchical time windows model is proposed to support the description of data stream at different time periods with multiple granularities. Multi-granularity aggregate tree data structure and pyramidal snapshots storage structure for expired data are also designed. Through performance analysis it can be seen that those designed structure suffices the rigorous requirements of the online aggregation and the query analysis of data stream whether considering the storage space or the processing time. In order to query the aggregations of data stream effectively at limited space-time expense, online aggregation methods and approximated query algorithms are also proposed.4. Dynamic multi-dimensional data model for data stream is proposed together with the correspondent online multi-dimensional aggregation methods.Multi-dimensional data model for the online analyzing and processing of data stream, is proposed based on time dimensional patterns of multi-hierarchical time windows model. Compared with ordinary multi-dimensional data model of data warehouse, the proposed one for data stream is advantaged in that it supports the infinite span of the time dimension and the continuous changes of datasets. The infinite span of time dimension makes it difficult for any storage system to preserve all the data in the whole time domain, thus it is an inevitable choice to model the time dimension of data stream with the multi-hierarchical time windows model. The rapid and continuous changes of data determine that a reasonable model should support the online multi-dimensional data aggregation.The observed properties of data stream have the features such as representative, technical, supporting details and so on, it is very difficult to construct and select the dimensions in the multi-dimensional online analysis processing of the data stream. This thesis presents the online clustering algorithm which supports the dynamic dimensional modeling of the data stream, and designs a data structure which supports the online clustering and multi-dimensional aggregation of the data stream, and proposes the online aggregation and materialized method of the basic units of the data stream.The research on the fuzzy and dynamic multi-dimensional data modeling of this thesis has the theoretical and practical significance for promoting the close integration and the wider use of data warehouse, OLAP, and data mining.

  • 【分类号】TP311.13;TP18
  • 【被引频次】1
  • 【下载频次】835
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络