

Content-Based Image Retrieval in Scientific Databases

【作者】 唐俊华

【导师】 阎保平;

【作者基本信息】 中国科学院研究生院(计算技术研究所) , 计算机系统结构, 2004, 博士

【摘要】 随着科学数据库的不断发展,其中将包含越来越多的多媒体信息:图像、音频、视频等,这些多媒体信息的对科学数据库的应用提出了巨大的挑战。如何在规模庞大的多媒体信息中获得相关信息,是许多科学数据库应用系统面临的首要问题。本文利用图像处理、模式识别、计算机视觉与数据库等技术,针对基于内容图像检索所涉及的关键问题展开研究。本文的研究目标一方面是提高基于内容图像检索系统的精度,研究的重点是适合于科学数据库图像检索的图像特征提取方法,检索方法、及索引方法;另一方面根据科学数据库图像检索系统的特殊应用环境讨论了分布式图像检索的模型,并针对分布式图像检索系统中所涉及的关键问题进行了深入的研究。取得的主要研究成果包括:1)提出一种通过提取相邻主色的相对位置和拓扑结构来表示彩色图像特征的方法。在检索时不仅可以有效的利用图像的颜色特征还可以利用色块之间空间分布关系的特征,可以有效的提高系统的查准率。提取主色及其空间拓扑关系的算法不涉及图像的复杂分割过程,算法简单有效。2)针对传统颜色直方图的缺点,提出一种彩色图像加权直方图颜色特征表示方法。,这种特征保留了传统颜色直方图特征简单高效、对视角变化不敏感的优点,同时可以将色彩的空间分布信息融入到直方图的表示中,能有效地提高查准率。作为一种基本的检索方法可以与其它检索方法集成。3)科学数据库中图像库规模一般都比较大,做相似图像检索时,快速地检索出一个大致符合要求的图像集合,往往比花费很长的时间检索出与检索要求完全相符合的图像集合更具有吸引力。根据这一点本文提出了一种运用LSH(Locality Sensitive Hashing)算法的图像特征索引算法,这种算法能够有效地对超大搜索空间的裁剪,降低相似搜索的时间复杂度。实验证明,在数据规模、维数增大时,该索引方法仍然具有很好的性能。与传统索引方法比较具有更好的规模可扩展性。4)科学数据库中,图像数据资源异构、分布的特点,需要分布式的图像信息检索体系结构。在本文中通过分析各种不同分布式体系结构的优缺点,确定使用基于对等网络的分布式体系结构。针对基于对等网络图像检索系统中所涉及的关键问题提出一种基于查询内容进行查询路由的方法,有效地提高了分布式图像检索系统的查询效率,并对基于XML图像检索协议进行了初步的讨论。5)实现了一个分布式图像检索原型系统PeerSeek。

【Abstract】 With the development of scientific databases, more and more multimedia information are incorporated, which includes images, audio, video, etc. Scientific databases are facing big challenge on how to get relevant information from the vast multimedia information repository. By using the technologies of image processing, pattern recognition, computer vision and database, this dissertation studies some key problems in the field of image retrieval. This dissertation researches on new methods of image features extraction, image retrieval and feature vectors indexing under scientific database environment; on the other hand, due to the distributed nature of scientific database, we discussed several models of distributed image retrieval system, furthermore some issues related to distributed image retrieval are deeply studied. The contributions of the dissertation are as follows:1) A new color image feature extraction method is proposed, which can successfully extract neighboring main color and its topologies. This kind of feature not only uses the color feature but the spatial relationship between different main colors. In comparison, this feature extraction method do not involve complex image segmentation processes, it’s simple and useful.2) The classical color histogram has its drawbacks; it does not contain information about the spatial distributions of pixels in an image. This dissertation present a new weighted color histogram feature, it can integrate the pixels spatial distribution information into the color histogram. Thus retain the advantages of classical color histogram such as simple and invariant to viewpoint changes; at the same time, can substantially improve the recall ratio. As a fundamental retrieval method, it can easily be integrated with other methods.3) Typically, the scale of image databases in scientific database are very large, it is more attractive to quickly retrieve a rough result set than to spend a lot of time to get a precious result. According to this point of view, a vector indexing method called locality sensitive hashing is proposed. This method can effectively prune super large searching space; reduce time complexity of similar search process. The experiment results show that this method is still effective when the data scale is very large, and it has superior scalability than traditional tree-structured indexing methods.4) Under the scientific database environment, image data sources are heterogeneous and distributed, image retrieval system needs distributed architecture.


