节点文献

农产品产地污染综合知识挖掘系统的研究

The Research on the Integrated Knowledge Mining System for the Pollution in the Agro-Product Area

【作者】 郑向群

【导师】 赵政;

【作者基本信息】 天津大学 , 计算机应用技术, 2009, 博士

【摘要】 为解决产地污染数据综合利用问题,开展产地污染分析与评估,特立此课题进行研究。本课题的研究目标是:对农产品产地污染监测结果及相关空间数据进行知识挖掘,建立一个产地污染综合知识挖掘系统。该系统由四部分组成:数据清洗系统,非空间谓词挖掘系统、空间谓词的提取系统,空间-非空间关联规则挖掘系统。本文采用属性清洗和重复数据清洗技术完成产地污染数据清洗工作。针对属性清洗,文中提出了统计分析清洗方法、聚类清洗方法、基于模式的清洗方法、关联规则清洗方法;针对重复记录的清洗,运用了DBSCAN聚类方法提取相似重复记录集,然后采用蚁群算法进行合并和删除重复记录,创造了一种新的数据清洗方法。本文将产地土壤污染非空间谓词的提取分为两部分,一是非空间背景知识的提取,二是产地污染原子命题集的提取。首先,采用了关系演算方式,以关系(元组、属性)建立笛卡尔积的形式获取非空间背景知识;然后,建立了一种产地污染预测与评估和原子命题集提取的新方法,即:利用PCA主成分对污染数据降维,采用RBF网络对产地污染状况进行评估预测,最后运用SWM相似权值法抽取规则的形式,提取原子命题集。本文建立了空间谓词提取新方法,引进了空间对象分层挖掘概念,改进了原有的空间谓词九交矩阵提取方法,以粗糙集理论创建了粗糙九交矩阵,并利用CART决策树完成空间谓词的提取,最后建立约束规则,对空间谓词进行归并,使得生成的分层谓词空间既精简又不丢失信息量,为后续关联规则挖掘奠定了基础。本文引进了SPADA算法来挖掘空间-非空间关联规则。在非空间谓词集和空间谓词集的基础上建立空间观察集,在分层的基础上以θ代换方式开展层内搜索和层间搜索,从而建立空间-非空间关联规则。同时,还建立了模式约束和关联规则约束,从而提高了搜索和剪枝速度。最后,本文建立了一个综合知识挖掘系统的实例。以湖北大冶的产地污染监测数据为依据,对数据清洗算法、土壤污染非空间谓词提取算法、空间谓词提取算法以及空间-非空间关联规则挖掘等进行了验证。验证结果表明,该系统挖掘出的产地污染知识较好的反映了当地产地污染现状。

【Abstract】 With the development of China economy, the serious pollution on the agro-product area has caught so many eyeballs of the public. Aiming at improving utilization efficiency of the pollution data , we have a deep research on the pollution analysis and evaluation and as a result this paper come out.In detail, our research has built up an integrated knowledge mining system for the pollution in the agro-product area, in which a knowledge mining process can be applied to the pollution monitoring results and some corresponding spatial data of the agro-product area. This system would consist of four parts: data clearing system, non-spatial predicate mining system, spatial predicate extracting system, and spatial&non-spatial association rules mining system.The technologies adopted to clear the data of the agro-product polluted area can be described as attributes clearing and duplicated data clearing. Statistics, clustering, pattern-based and association rules have been discussed and one of which was selected as the optimum method for the attributed clearing. A new technology was developed to clear the duplicated records, which can be described as: the DBSCAN clustering method was adopted to extract the similar duplicated records, and then ant colony algorithm was run to merge and delete the duplicated records.Non-spatial predicates extracting can be split into two sub-tasks, one of which is non-spatial background knowledge extracting, the other is the atomic proposition sets extracting. Firstly, the non-spatial background knowledge was extracted in the form of a Cartesain Product which would be built as relation(tuple, attribute) after relational analysis. When we extract the atomic proposition set, the prediction estimation of the pollution in the agro-product area would be performed at the same time. Namely, the Principal Component Analysis was applied to reduce the dimensions of the pollution data. The RBF neural network was adopted to get the prediction estimation. And then the Similar Weight Method was used to extract the rules to form the atomic proposition set.In this paper, a new technology of extracting spatial predicates was delivered. Firstly, the concept of spatial objects hierarchy was introduced, and then we use the rough sets technology on the base of the 9-intersection model to build a new rough 9-intersection matrix, on which the CART decision tree was adopted to extract the spatial predicates. A refined spatial predicates space was obtained after the merging operation to the spatial rules was performed under the limitation of the constrain bias.In order to mining the spatial association rules, the SPADA algorithm was introduced in this paper. The spatial observations was built up on the bases of non-spatial predicates space and the spatial predicates space. The intra-level and the inter-level search were implementing according to theθ-subsumption in the structure of the predicates space hierarchies. the spatial association rules set would be presented in the end. At the same time, the pattern/rules constraint biases were applied to improve the searching and pruning speed in the mechanism.In the last section of the paper we describe a practical example that shows how it is possible to perform a spatial analysis on the pollution data in Daye of Hubei province. We exam all the algorithms we mentioned at above and the results show that this integrated knowledge mining system works well and the mining results are satisfied.

  • 【网络出版投稿人】 天津大学
  • 【网络出版年期】2010年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络