

Research on Image Preprocessing Techniques in Image Mining

【作者】 刘茂福

【导师】 何炎祥;

【作者基本信息】 武汉大学 , 计算机软件与理论, 2005, 博士

【摘要】 随着图像获取技术与图像存取技术的进步,尤其是Internet上图像数据的急剧增加,出现了“图像数据极大丰富,但有关图像的信息与知识贫乏”的局面。人类对能从图像数据中自动抽取有意义的语义信息与知识的图像挖掘理论与系统工具的需求日益迫切。这一需求迅速引起了数据挖掘、信息检索、人工智能、多媒体及其它相关领域研究人员的注意,并将数据挖掘技术引入到图像研究领域,去发现隐藏在大量图像数据中的信息与知识,从而指导基于图像信息的决策行为。从图像理解的角度出发,也会很自然的将数据挖掘与图像理解结合在一起,从而研究图像挖掘。参考Fayyad对数据挖掘的定义,不难得到图像挖掘的概念描述,图像挖掘就是从复杂的图像数据中抽取隐藏其中的、有效的、新颖的、潜在有用的并最终可被用户理解的语义信息与知识的非平凡过程。图像挖掘是一种图像理解的关键技术与方法。图像挖掘目前存在的主要问题包括:系统框架模型:图像挖掘需要一个切实可行的系统框架模型。图像预处理:图像挖掘领域中的挖掘对象不仅包含复杂的图像数据,并且还有与图像有关的文本数据。用传统的关系模型来直接表示图像数据,效果并不是很好。因此,要对复杂的图像数据进行挖掘,首先要对之进行复杂有效的预处理。特征描述困难:使用数字图像处理技术对要挖掘的图像进行处理后,要挖掘出精确的知识,还必须提取和描述图像目标的特征。图像的基本特征包括文本、颜色、纹理、位置以及形状等,需要集中精力研制描述图像特征的新方法。特征维度过高:使用图像特征提取技术获得的图像各种特征向量的维度太高,不适于后续的图像挖掘方法的执行,也会增加图像挖掘算法的复杂度,需要对图像特征向量进行优化或降维处理。本文主要围绕图像挖掘及其关键技术开展了研究工作,其主要研究内容与贡献如下:图像挖掘框架模型:从功能和信息数据两个方向研究图像挖掘框架模型,给出了更实用的图像挖掘过程框架模型;图像特征提取:图像视觉语义特征的表示与描述在图像挖掘中非常重要。本文在讨论基于正交不变矩的图像形状特征描述方法的同时,考虑到不变矩的描述能力需要评价与验证,提出了基于图像重构的正交不变矩描述图像形状特征能力的评价方法。图像矩形状特征优化:使用图像矩精确表示与描述图像目标的形状特征,需要的矩值数目太多,这样会增加矩计算的复杂度,同时也会导致图像矩向量维数太高。因此,本文结合演化计算提出了低阶矩优化算法,使优化后的低阶矩能比较精确地表示与描述图像目标形状特征。高维特征降维:通过对图像数据库或图像集中的图像进行特征提取所获得的表示图像的特征数据一般都是高维的,在真正执行图像挖掘算法之前,需要对高维图像特征数据进行降维处理。常用的数据降维方法主要有主成分分析、因子分析等,但它们的操作对象主要是数值特征;本文结合粗糙集理论,提出了新的图像高维特征数据的降维方法。

【Abstract】 With the development and maturity of the image acquisition and storage technology, especially the sharp increasing images on the Internet. We are compelled to face the huge great deal of image data and have no time to look at the image data and content in detail. People need the techniques and tools to analyze the image data, expecting to discover the underlying and useful knowledge and patterns in the image. Referring to the definition of the data mining by Fayyad, we can define image mining as the nontrivial process to discover valid, novel, potentially useful, and ultimately understandable knowledge from large image sets or image databases. In fact, from the view point of image understanding, we can combine the image mining and image understanding to research on the emerging image mining technology.There are many challenges in the image mining area and they are listed as follows:Framework: We need a feasible framework to work over the image mining technology and the others.Image preprocessing: The objects to be mined include not only the image data, but also the text data correlated with the image data in the image mining area.It is not a good method to use rational model to represent image data directly. In order to implement the image mining process, the image data must be effective preprocessed at first.Feature description: In order to use the traditional data mining methods from rational data and databases in image mining area to discover and extract the precise knowledge, we should extract and describe the image basic and content features. We should pay more attention to developing new image features description methods.The curse of the dimensionality: After the image features extraction, we are amazed to find that too many feature dimensions need to be analyzed. If we want to mine the images efficiently and effectively using traditional data mining methods, we have to optimize or reduce the image feature dimensionality.The goals of the dissertation are to develop and use techniques and methods implementing the image mining task. The major contents and contributions are listed as follows:Image Mining Framework: We put forward the feasible image mining processing framework after analyzing the function-driven image mining framework and information-driven image mining framework.Image Feature Extraction: The image mining is based on the image basic and semantic features, so it is the critical phase to represent and describe them. We pay more attention to the feature extraction and feature description from the image object. We propose the evaluation measures of the Zernike moment descriptor based on image reconstruction.The optimization of the image shape moment feature: There are too many moment values in the image shape moment vector if we want to describe the image shape feature precisely, which will increase the complexity of the moment computation and lead to the curse of the dimensionality of the image shape moment vector. We put forward the optimization algorithm of the image shape moment feature based on the evaluation computation to represent and describe image object shape feature with corresponding low moment order.Dimension reduction: The image feature data are usually high-dimensional through the image preprocessing phase from the image database or image sets. The common dimension reduction techniques include primary components analysis, factors analysis and so on, but the handling objects with these techniques are usually the numerical features. The new dimension reduction technique of the image high-dimensional feature data based on rough set theory is put forward in the dissertation.

  • 【网络出版投稿人】 武汉大学
  • 【网络出版年期】2007年 06期