节点文献

数据挖掘技术在超市数据仓库中的应用研究

Research of Data Mining Technology in the Supermarket’s Data Warehouse

【作者】 林萍

【导师】 蒋波;

【作者基本信息】 大连海事大学 , 计算机应用技术, 2003, 硕士

【摘要】 传统的数据库管理信息系统不能够很好地利用、分析数据库中积累的大量数据,数据挖掘与数据仓库技术很好地解决了这一问题。本文首先介绍数据挖掘和数据仓库的相关知识,包括数据挖掘与数据仓库、联机分析处理、统计学之间的关系,接着详细论述了数据挖掘模式和数据挖掘过程模型,重点讨论了聚类模式中的动态聚类算法,并采用主成分分析法预处理数据,在此基础上提出了动态聚类的改进算法。 作为一个应用实例,本文在分析超市业务数据库的基础上,用星型架构的方式建模,构造出一个数据仓库的逻辑模型;然后从超市业务数据库中抽取数据,经过转换等处理,把“有价值的、干净”的数据加载到数据仓库中,完成数据仓库的构建。参照Two Crows数据挖掘过程模型,首先收集客户购买产品的类型、交易、属性等数据;然后采用主成分分析法预处理这些数据,以降低数据之间的相关性和减少变量个数;接着采用改进的动态聚类方法建模,在聚类过程中剔除异常点,改善聚类的质量,最终得到一个客户分片的模型,并对该模型作了比较详尽的解释。 数据挖掘和数据仓库有很紧密的联系,数据仓库是数据挖掘一个良好的奠基石;数据挖掘使数据仓库的决策作用得到更好的发挥,所以数据挖掘和数据仓库系统的无缝集成是数据挖掘界的一个热点。作为一种发展趋势,本文对此也作了进一步的论述。

【Abstract】 The data of large database can’t be fully used and analyzed by the traditional database management information system, on the other hand, data mining and data warehouse resolve such problem well. This paper first introduces the data mining and data warehouse’s knowledge, including the relation of data mining and data warehouse and the connection of olap and statistics, then puts the data mining pattern and the data mining process model in detail. It addresses the dynamic cluster arithmetic and preprocess data by the primary component analysis arithmetic, then improves the dynamic cluster arithmetic.As one application, this paper analyses the operational database, then builds the supermarket’s data warehouse logical model with the method of starschema, after that, the data is extracted from operational database. The "valuable and clean" data is loaded into the data warehouse after transformed by some tools or programming languages, then the physical model of supermarket’s data warehouse is finished. Following the Two Crows data mining process model, the product data, transaction data and customer’s demographic data are accumulated, such data is preprocessed by the primary component analysis method which can low the connection of variants and reduce the number of variants. One model is built by the improved dynamic cluster method. The quality of the model’s result will be improved with deleted the outlier data. At last the result of this model is explained in detail.There is closed connection between the data mining and the data warehouse: data warehouse is one excellent platform of data mining, what’s more, the decision function of data warehouse can be developed well withthe help of data mining, so the seamless integration of data mining anddata warehouse is one hit topic, this trend also be addressed more in thispaper.

  • 【分类号】TP311.13
  • 【被引频次】4
  • 【下载频次】444
节点文献中: 

本文链接的文献网络图示:

本文的引文网络