节点文献

领域驱动知识发现方法研究

Research on Domain Driven Knowledge Discovery Method

【作者】 朱正祥

【导师】 顾基发;

【作者基本信息】 大连理工大学 , 管理科学与工程, 2010, 博士

【摘要】 数据挖掘是一种常用的、从海量数据中获得潜在的、有用知识的发现过程,但传统的数据挖掘是以数据为驱动,强调挖掘过程的自动化,挖掘的结果中常常包括大量冗余、甚至错误的知识,无法直接应用于现实世界的实践活动。以挖掘用户感兴趣,可行动知识为目标的数据挖掘,需要将领域知识,特别是专家的偏好、经验、知识和智慧贯穿于整个挖掘数据过程中,变数据驱动的数据挖掘为领域驱动的数据挖掘,以弥补数据挖掘在学术研究与现实世界应用之间存在的鸿沟。另一方面,大量知识存在于人类载体中,特别是具有丰富理论与实践经验的领域专家,对于复杂问题的求解需要直接以领域专家为挖掘对象,获得群体专家对复杂问题的共识知识,并通过两种知识发现方式的相互补充、验证,从而能获更为全面、准确的知识。本文应用管理科学、计算机科学及综合集成方法论,以领域知识为驱动、数据与领域专家为挖掘对象研究获取用户感兴趣、可行动知识的方法,并将这种方法称为领域驱动知识发现,主要的研究内容包括:1、在分析传统数据挖掘存在不足的基础上,研究如何将领域知识贯穿到整个数据过程的方法,进一步丰富、充实领域驱动数据挖掘的相关理论。针对传统数据挖掘过程模型CRISP-DM存在的不足,提出一个新的领域驱动数据挖掘过程模型,并引入综合集成系统方法论指导领域驱动数据挖掘过程。2、提出一个基于语义的Apriori改进算法以实现将领域知识整合到挖掘算法中,以满足获取不同层次、不同目的的挖掘需求。3、研究如何在领域专家研讨获得专家的共识知识。在分析专家知识特点的基础上,建立专家知识模型,在对专家共识分析时,提出用对应分析方法从专家与专家意见两个维度同时进行聚类并在二维平面上进行映射,以挖掘专家之间,专家与意见之间的聚类知识,同时改进一个二分图网络投影压缩算法,并将其应用于计算专家的意见相似度中,以定量化方法描述专家意见的相似性和独立性。4、在上述理论研究的基础上,设计开发一个以领域知识为驱动知识发现平台,利用该平台分别从数据和专家两种知识载体为对象挖掘名老中医的学术思想,实证结果表明领域驱动知识发现方法的可行性及优势。

【Abstract】 Data mining is a common knowledge discovery process which taps potential and useful knowledge from massive data.However, traditional data mining is data-driven, emphasizing automation in data mining process, therefore the mining results often include a lot of redundant, even wrong knowledge which can not be directly applied to real world business activities. Data mining method which aims at mining users’interests and actionable knowledge should embrace domain knowledge, especially experts’preferences, experiences, knowledge and wisdom throughout the mining process, transforming data-driven data mining into domain-driven data mining, so as to bridge the gap between academic research and the real-world application. On the other hand, a lot of knowledge stores in human minds, especially those experts with rich theoretical and practical experiences. Therefore, in order to solve complex problems, experts should be regarded as direct mining targets. In this way, experts’knowledge on complex issues can be obtained, and through the combination of two knowledge discovery modes, we can get more comprehensive and accurate knowledge.With management science, computer science and meta-synthesis as the basis, interesting and actionable knowledge gained from both data and experts is regarded as Domain-Driven Knowledge Discovery, the tasks of this dissertation include:1.Based on analyzing the shortcomings of traditional data mining, it studies theories and methods on how to integrate domain knowledge into the whole data mining process, so as to further enrich the content of the domain-driven data mining theories. As to the deficiencies of traditional data mining model (CRISP-DM), this thesis proposes a new Domain-Driven Data Mining model and introduces the methodology of comprehensive integrated system as the guidance of Domain-Driven Data Mining process.2.It also proposes a semantic Apriori algorithm which integrates domain knowledge into the data mining algorithm in order to meet demands of different levels and various mining purposes.Then, the thesis studies how to get consensus of experts during discussion.3.After analyzing the characteristics of expert knowledge, it builds an expert knowledge model, using correspondence analysis to conduct clustering in both expert and experts’views dimensions simultaneously and mapping on two-dimensional plane in order to tap knowledge among experts, as well as knowledge between experts and experts’views.Meanwhile, a weighted bipartite network projection algorithm is applied to calculate similarity of the experts’opinions, so as to describe similarity and independence of experts’opinions in a quantitative way. 4.Finally, this thesis designs and develops a domain knowledge-driven knowledge discovery platform, and applies the platform to tap academic thoughts of Chinese veteran practioners of TCM from data as well as from experts.The empirical result shows feasibility and advantages of domain-driven knowledge discovery.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络