节点文献

基于内容的卫星云图挖掘方法研究

Research on Content-based Satellite Cloud Image Data Mining Technolgy

【作者】 来旭

【导师】 李国辉;

【作者基本信息】 国防科学技术大学 , 控制科学与工程, 2010, 博士

【摘要】 卫星云图全面、及时、动态的反映各类云系的特点及变化过程,成为气象、水利部门在防洪抗旱决策过程中不可或缺的重要参考依据。多年的云图接收积累了数量巨大的卫星云图,依靠人工判读方式根本无法满足时效性要求,尽管一些人工智能的方法能自动完成数据分析,但这类方法只能按照设定的规则执行,不能主动发现隐含在数据内部的知识。图像挖掘技术作为数据挖掘领域研究的前沿,提供了从大量图像中获取隐含的、有价值的、可理解知识的理论及方法。本文以图像挖掘技术理论与方法为指导,设计了面向云图集和云图—雨量混合数据集的三类挖掘任务,所获得知识将对云图智能化理解和基于卫星云图的降水预测等研究具有重要价值。论文的研究工作及贡献包括以下方面:(1)在卫星云图预处理方法研究中,提出了新的非线性自适应噪声滤除算法。该算法与常用的中值滤波技术相比,它能有效的消除椒盐噪声,保护云图中非噪声点不受影响,确保像素信息能够真实反映云内状态。云图中存在经纬线、地名等标注对象,它们会影响云图特征参数的提取。针对标注对象的形状特点,提出一种基于整体变分技术的标注对象剔除算法,通过引入权值改进了整体变分的离散化过程。结果表明算法有效剔除标注对象的同时保护了邻域信息。(2)在云图感兴趣区域提取研究中,提出了基于云图直方图的加权聚类算法,利用该算法实现典型云区的提取。为了更加符合云图数据样本在特征空间的分布特点,重点研究了对聚类算法的改进策略:1)针对类别个数自适应确定方法的改进,提出利用遗传算法结合评价指标曲线找出最优类别数,提高算法自动化水平。2)针对相似性测度的改进,提出基于链式距离的相似性测度,克服了欧式距离测度对数据分布的敏感性问题。3)针对聚类机制的改进,引入半监督思想,既能克服单纯聚类的盲目性问题,又能避免分类面临的训练样本问题。以直方图替代云图像素作为聚类对象,大幅减少了算法处理时间。(3)在云类智能识别的研究中,本文针对特征提取、特征选择、分类模型三个问题提出了对应的算法及模型。针对云区的无规则特性,提出了“基圆模型描述法”用于云区的描述,在此基础上提取云的形态特征参数,克服了以往算法只能提取颜色、纹理等特征的不足。为避免过拟合问题,本文采用特征曲线分析方法,从特征候选集中确定分类模型的输入特征集。本文提出将“IPSO—BP网络模型”作为分类模型。该模型采用改进的粒子群优化算法替代后向学习算法作为BP神经网络模型的学习算法,在一定程度上克服了收敛速度慢,易陷入局部极小值,过分依赖初始值的选择等不足。为了在原有分类模型框架下更好的利用多特征信息,本文提出了基于多特征融合的组合分类模型,将特征子集分别送入子分类模型后作出本地决策,采用投票表决法将多个本地决策融合后获得最终的结果。结果表明多特征融合分类模型在分类精度上优于单一分类模型。(4)在基于云图—雨量混合数据集的关联规则挖掘研究中,本文以云图灰度和云顶亮温间的关系为基础,设计了四种与降雨关系密切的云状态参数。通过时空同步处理,云图参数和雨量数据构成统一的混合数据集。为实现数值属性的转换,本文提出一种基于聚类的数值属性分区方法,它克服了“等深度区间划分法”对数据倾斜敏感的问题。为了提高对大规模云图—雨量混合数据集的处理效率,本文提出了基于数据分割的两阶段关联规则挖掘算法,它通过将原始数据库划分为多个独立的区间,由每个子区间的局部频繁项集产生全局候选项集,并设计了专门用于支持度计算的数据结构tidlists ,这些策略有效的减少了算法对数据库的扫描次数,大幅提高了算法的效率。结果表明当支持度阈值处于较低水平时,本文算法的执行效率显著优于Apriori算法的执行效率。

【Abstract】 The satellite cloud image which can reflect the characteristics and changing process of cloud system has become a very important reference factor to give the flood and drought forecast. The cloud image archive is a huge information resource. It is a difficult and time-consuming work to interpret these images manually. Although some artificial intelligence techniques can automatically accomplish the data analysis, they can’t abstract the latent knowledge that embedded in the data, for these techniques strictly followed some predefined rules. As the Frontiers of the Discipline, the image mining (IM) technology provides the theory and methods of abstracting unaware, potential, comprehensible and useful information and knowledge from the cloud image archives. Three kinds of image mining tasks facing mixed data sets consisting of cloud image and rainfall data is designed in this thesis, and the acquired knowledge is much valuable for cloud image content understanding and rainfall forecasting. The primary work includes the following aspects.(1)In the research of the cloud image preprocessing, a new nonlinear adapted de-noising algorithm is proposed. Compared with the classical median filter, the proposed algorithm shows a better performance in term of eliminating the salty and pepper noise. The algorithm can preserve the non-noise pixel from being changed. Therefore, it is guaranteed that the pixel information truly reflects the state of cloud. The cloud image which includes some labeling objects such as longitude, latitude and placename, may influence the feature extracting. Considering that most of the labeled objects are line objects, a TV based labeling objects eliminating algorithm is proposed. The method improves the discretization by introducing the weight strategy. Experiments show that the improved algorithm efficiently eliminates the objects at the same time preserving the neighborhood information.(2)In the research of the ROI extracting method, the thesis develop a weighted clustering method based on the cloud image histogram. The algorithm fulfills the representative cloud domain extraction. In order to adapt to the sample distribution in the feature space, we focus on the improving of the clustering algorithm strategy. There are three strategies: The first strategy is focused on the improvements on the clustering number self-confirming method. A genetic algorithm confirming the optimized clustering number combing the evaluating index is proposed. This strategy improves the automatic level. The second strategy is focused on the improvements on the similarity measurements. A link-distance similarity measurement is proposed. Different from Euclidian distance, the link-distance is not sensitive to the data distribution. The third strategy is focused on the clustering mechanism. The Semi-supervised method which can overcome the blindness brought by the single clustering also can avoid the requirement of many training samples for the classification, is proposed. Finally, to reduce the time spending, the histogram of cloud image is used as the clustering object instead of the origin cloud image pixels.(3)In the research of cloud classification, three algorithms are proposed corresponding to the feature extraction, the feature selection and the classification model. To deal with the irregularity of cloud domain, the‘basic round description model’is proposed to describe the cloud domain. The shape features are extracted based on this model and combined with the color and texture to form a comprehensive candidate feature sets. Also, a‘BP-IPSO’model that adopts pso optimizing algorithm as learning algorithm instead of the bp neural network algorithm is proposed. The BP-IPSO model deals with the problems of slow convergence speed, easily falling into local minimum, and sensitive to the initial value. To efficiently fuse multiple features under the original classification mechanism, we developed a multiple features combination classification model. The local decisions made by child classification model are used to form the final decision by the voting method. Compared with the single classification model, the combination classification model acquires higher accuracy.(4)In the research of association rules mining based on the cloud-rainfall mixture data, four kinds of parameters which can reflecting the cloud state, is introduced. These state parameters take the relation between the grayscale of cloud image and the temperature of cloud top as the basis. Cloud state parameters and rainfall constitute the uniform mixture data through the synchronizing of time and space. To implement the transform of numerical attribute, a numerical attribute partition algorithm based on the clustering technology is proposed. The algorithm deals with the problem that the equal-depth partition method is sensitive to the data skew. To improve the processing efficient of large scale cloud-rainfall mixture dataset, a‘Two-step Association Rules Mining based on the data partition’algorithm is proposed. Firstly, the algorithm partitions the database into several independent intervals. The global candidate itemsets are generated by the local large itemsets in each interval. Finally, we designed the‘tidlists’data structure for the support count. These strategies improve the efficiency of the method by reducing the database scanning. It is proved that the method executes much more efficiently than Apriori method when the support threshold is low.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络