节点文献

基于本体的数据清洗系统框架研究

Research on Framework of Ontology Based Data Cleaning System

【作者】 张联超

【导师】 黄志球;

【作者基本信息】 南京航空航天大学 , 计算机应用技术, 2008, 硕士

【摘要】 随着数据库技术的飞速发展以及数据获取手段的多样化,数据资源日益丰富,数据量急剧增加。数据的价值在于其自身的质量,基于劣质数据的决策支持具有不可信性,目前数量巨大而且零乱的劣质数据成为制约数据应用的“瓶颈”。因此,作为数据质量问题的主要解决技术,数据清洗成为研究的热点。然而现有数据清洗技术的研究大多是从数据文本取值的层面进行清洗处理,往往忽略了数据自身蕴含的语义信息。因此,如何在现有数据清洗研究中引入语义特性成为该领域一个新的研究点。针对这一研究课题,本文主要开展了如下几个方面的研究工作:首先,基于信息化建设的背景,对数据质量问题和数据清洗问题进行了研究。通过对该领域在国内外研究现状的分析,归纳了现有数据清洗研究中存在的不足,并论证了利用本体及相关技术解决上述不足的可行性。其次,针对知识表示及其常规性的方法,本体及相关技术的研究进行了总结,作为支撑论文研究的理论基础。然后,基于本体提出了一个数据清洗系统框架。按照资源描述的特性,将系统框架划分为描述静态语义信息的本体表达模型和描述过程语义信息的动态处理模型,并分别给出了模型中各组成部分的形式化描述和主要模块在处理过程中的工作原理和实现机制。最后,在对课题研究中的两个语义模型进行分析介绍的基础上,设计并实现了基于本体的数据清洗系统框架,并使用UML对框架的静态结构设计和动态行为语义进行了建模,解决了现有数据清洗研究中缺乏语义约束和不能支持自动推理的问题。

【Abstract】 With the rapid development of database technology and the diversification of ways for getting data, the categories of data are increasing rapidly and the amount of data is increasing dramatically.The value of data lies in the quality rather than the quantity, and the decision based on bad data is unbelievable. The huge and chaotic poor data has become a"bottleneck"in data application.As a primary method, data cleaning has become a hotspot to resolve the data quality problem.However, most of the current researches are based on the text value but the latent semantic of the data.How to introduce the semantic to the current researches is becoming a new hotspot.Data cleaning and its semantic are studied in this dissertation, and the main contributions are as follows:Firstly, the data quality and data cleaning under the background of the information construction are researched in this dissertation. According to the analysis of the domestic and foreign researches in this field, the weaknesses of current researches are summarized. Then the ontology and its critical technology are introduced to resolve them, meanwhile the argumentation of this method is given.Secondly, the researches of knowledge and its expression method, ontology and its critical technology, are summarized in this dissertation and used as the theoretical principle of our research.Thirdly, a data cleaning system framework based on ontology is proposed in this dissertation. According to the characteristics of resource description, the system framework is divided into the ontological expression model and dynamic processing model, which describe static semantic information and processing semantic information respectively. Meanwhile, the formal description of every component of the model, the working principle and implementation mechanism in process of main modules are also given respectively in this dissertation.Finally, the data cleaning system framework is designed and implemented in this dissertation under the analysis of both semantic models. The static structural designs and dynamic behavior semantics are modeled with UML.And the framework resolves the lack of semantic restriction and automated reasoning in current research.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络