节点文献

基于SQL Server构建数据挖掘解决方案的研究及应用

【作者】 郝瑞吉

【导师】 汤天浩; 施伟锋;

【作者基本信息】 上海海事大学 , 控制理论与控制工程, 2004, 硕士

【摘要】 DM是近年来信息产业界讨论和研究的一个热点,目前DM的研究大多集中在算法方面。大多数DM系统无法与数据存储的主要介质数据库无缝集成,同时由于数据挖掘标准语言的欠缺,使DM技术的应用范围仅仅限于领域专家。因此将DM与数据库紧密耦合及数据挖掘标准语言的开发,已经成为当前DM领域中新的研究热点。 本文就是在这样一个大背景下,结合上海市教委重点学科沪教委科(2001)71资助的中法合作项目“数据挖掘在GIS中的应用”,研究了基于SQL Server构建数据挖掘解决方案的方法以及DM技术在GIS中的应用。本文主要做了以下几个方面的工作: 第一,本文研究了OLE DB For DM和PMML两种标准DM语言,探讨了支持这两种标准语言的SQL Server的基本结构,在此基础上研究了基于SQL Server分析服务构建数据挖掘解决方案的方法,构造了相应的系统结构,给出了软件开发实例,利用OLE DB For DM中定义的DDL以及DSO分别从客户端和服务器端完成了DM模型的创建、训练和存储,达到了DM、数据库和应用程序一体化的目标。 第二,本文研究了将自主开发的DM算法外挂于SQL Server构建数据挖掘解决方案的方法,构造了系统结构,给出了软件开发实例。 第三,本文重点研究了如何在SQL Server中集成自主开发的DM算法,给出了整体实现框架,并在VC++7.0下实现了DM算法的集成,将DM算法与SQL Server数据库无缝集成到了一起,使得DM算法符合OLE DB For DM,利用该算法构建了DM模型,做出了预测查询。 第四,本文研究了DM在GIS中的应用,探讨了DM与GIS的集成,提出了一种DM与GIS集成的结构,构造了智能航线设计系统,将自主开发的复合聚类分析算法以模块形式外挂于SQL Server的方式应用到了智能航线设计系统中,完成了航线设计,同时本文还基于SQL Server分析服务构建了GIS数据挖掘解决方案,创建了GIS数据挖掘模型,并将该模型以PMML标准形式存储,给出了详细的船舶分布统计信息。 第五,本文比较了三种基于SQL Server构建数据挖掘解决方案的优缺点,得出了理想的构建数据挖掘解决方案的方法,为数据挖掘的广泛应用提供了一条新的思路,同时对利用复合聚类分析算法和微软聚类算法设计的航线进行了分析比较。

【Abstract】 Data Mining has become one of the most popular researches in IT industry in recent years, but the research is mainly concentrated on algorithm. Presently the application of DM is only confined to domain experts, since almost all the Data Mining systems are not seamlessly integrated with Relational Database, and the deficiency of Data Mining language is also the reason for it. Therefore the integration of Data Mining with database as well as the development of Data Mining standard language has become one hot spot in current researches in DM field.Under such a background, the thesis, in combination of a Sino-French cooperation project "The Application of Data Mining in GIS", which was financed by Shanghai Education Commission Section (2001) 71 as Key Discipline of Shanghai Education Commission, studies the formulation of Data Mining solution based on SQL Server as well as the application of DM technology in GIS.SQL Server and Data Mining constitute the main line throughout the whole thesis. The author’s object is, on the basis of OLE DB For DM, to integrate Data Mining with Relational Database as well as the application program. For this reason the thesis particularly discusses three ways of building the solution for Data Mining based on SQL Server.The first way is to use the Data Mining algorithms provided by SQL Server Analysis Services to solve the problems of Data Mining. Those algorithms are completed in accordance with OLE DB For DM, so that they can be directly used to build Data Mining models from Relational Database. The models will be stored in PMML style and can be used in any application program. In this part, the author provides the system structure and gives an example of this kind of Data Mining solutions.The second way of building the solution for Data Mining based on SQL Server is to embed some Data Mining algorithms designed by the author himself or others into SQL Server as independent program modules. Then these modules could be used to find knowledge in data warehouse.The third way is to use the interfaces provided by SQL Server AnalysisServices to integrate the Data Mining algorithms of third providers. The algorithms must accord with OLE DB For DM, so that they can communicate with Analysis Services through a series of COM interfaces. In this part, the author develops a DLL based on line regression algorithm in VC++ development environment. The algorithm will be useful after compiled the DLL, now people can use this algorithm to build Data Mining model and train it in Relational Database.Finally, the author researches how to integrate Data Mining into GIS and builds two solutions of Data Mining in GIS based on SQL Server. In the first solution, a new improved integrated clustering analysis algorithm is used to carry out course design. In the second solution, the author separately builds Data Mining models from server and client in GIS application program. In this way, the author also designs the passage with Microsoft clustering algorithm and obtains some other valuable results.Besides, the author makes comparison between and among the three solutions of Data Mining based on SQL Server discussed in this thesis, and analyzes two kinds of results with different methods to optimize passage design.This thesis facilitates the formulation of Data Mining solutions based on SQL Server, and bridges the gap between Data Mining and Relational Database. In this way, Data Mining application program in Relational Database or Data Warehouse can be directly developed or operated. In addition, the way to formulate Data Mining solutions in GIS is showed in the thesis as well.Hao Ruiji (Control Theory and Control Engineering) Directed by Prof. Tang Tianhao and Prof. Shi Weifeng

  • 【分类号】TP311.13
  • 【被引频次】11
  • 【下载频次】1469
节点文献中: 

本文链接的文献网络图示:

本文的引文网络