节点文献

海量多媒体数据库的高效查询处理

Efficient Query Processing over Large-Scale Multimedia Databases

【作者】 庄毅

【导师】 庄越挺;

【作者基本信息】 浙江大学 , 计算机应用, 2007, 博士

【摘要】 随着多媒体和网络技术的迅猛发展,互联网已经形成了一个巨大而复杂的多媒体信息空间。其所包含的海量多媒体信息资源具有以下的特点:1)数量巨大,增长迅速;2)内容丰富,形式多样;3)结构复杂,分布广泛;4)无序混乱,杂乱无章。面对这些互联网中浩翰的多媒体信息资源,如何对其进行快速准确地检索及高效地处理已经成为一个很重要的研究课题。本论文以数字图书馆作为目标应用,面向海量多媒体数据,提出并解决了海量数据高效查询处理的一系列问题。对海量高维多媒体信息的索引及查询技术进行深度和广度上的研究,重点解决了以下5个方面的问题:●针对海量高维多媒体数据查询存在的“维数灾难”的问题,提出两种高维索引方法,即基于复合距离转换的高维索引(CDT)方法和基于编码的双距离树索引(EDD-Tree)方法,以提高海量多媒体检索效率;●针对书法字数据特点,分别提出基于局部距离图(PDM)的交互式书法字索引方法及基于混合距离树(HD-Tree)的书法字索引方法;●针对在单机环境下,海量多媒体数据查询性能低下的问题,进一步提出了基于数据网格的可扩展并行查询技术。该技术包括优化海量数据在网格结点中的分布、基于索引的快速高维数据集的缩减、并行流水线处理及高效的数据传输机制。理论和实验表明该技术能显著提高相似查询效率;●针对频繁的用户查询请求,提出基于网格环境的高维相似查询的多重查询优化技术,进一步提高在查询密集条件下海量多媒体检索的并发性;●针对海量跨媒体检索的特点,提出一种跨媒体数据的统一索引框架——CIndex。需要特别指出的是,目前国际国内学术界对海量跨媒体检索与索引的研究工作刚刚起步,相关研究还几乎没有。本文对该问题进行了系统而深入的研究,提出针对跨媒体检索与索引的一系列方法和理论成果,具有很大的理论和实际意义;

【Abstract】 With the rapid growth of multimedia and Internet technologies,the Internet has become a very huge and complex multimedia information spaces.The characteristics of the multimedia resources in Web include:1).Huge amount of data;2).Heterogeneity and multiple modalities;3).Complex structure; 4).The Unordered.Facing these massive resources in the Web,how to fast and accurately retrieve and manage such large-scale multimedia information is a very important research topic.The work presented in this paper extends both the depth and broadness of the query,index and update over large-scale high-dimensional multimedia data.The work focus on the following five aspects:●Due to the "Curse of Dimensionality" of the multimedia data,we propose two high-dimensional index schemes respectively,such as a composite-distance-transformation(CDT)-based high-dimensional index and an encoding-based dual distance tree(EDD-Tree)index,which can speed up the large-scale multimedia retrieval efficiency;●For the characteristics of the Chinese calligraphic character,we propose the two index schemes such as a partial-distance-map(PDM)-based interactive character index and a hybrid-distance-tree(HD-Tree)-based character index;●Due to the fact that the retrieval performance of large multimedia databases in a single-PC environment is not satisfactory,we peopose a grid-based retrieval algorithm to take advantage of the parallelism of grid computing,which can further speed up the retrieval efficiency.The technique includes the optimal data allocation policy in grid environment,the index-based vector set reduction, pipeline mechanism and high-efficient data transfer method;●With the increase of query-intensive applications,we propose a multi-query optimization technique for similarity search in grid environment,which is to further speed up the parallelism of the query-intensive-based large-scale multimedia retrieval;●To effectively support a large scale cross-media retrieval,we propose an integrated index structure,which is called the CIndex.To the best of our knowledge,this is the first work to study the cross-media retrieval and indexing. The experimental results show the effectiveness and efficiency of this method;

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2008年 07期
节点文献中: