节点文献

海量多媒体数据的地理信息标注技术及其应用

Geo-tagging for Large-scale Multimedia Data and Its Applications

【作者】 刘衡

【导师】 李厚强;

【作者基本信息】 中国科学技术大学 , 信号与信息处理, 2014, 博士

【摘要】 随着计算机技术、通信技术和多媒体技术的飞速发展,人们可以便捷地采集图像、视频等多媒体数据,并通过网络与其它用户进行分享。整个互联网的信息数据以爆炸式的速度进行增长,给人们带来了丰富的信息资源。而图像、视频为代表的多媒体数据在其中所占的比例越来越大。如何对海量的多媒体数据进行有效的组织、管理,已经成为工业界和学术界所日益关注的问题。对多媒体数据进行自动化的地理位置标注,能够让用户方便而快速地发掘相关的多媒体数据,对于多媒体数据的存储和可视化也很有帮助,具有极为重要的理论意义和实用价值。然而,对于海量的多媒体数据进行地理位置标注面临着一些挑战,对于图像、视频等多媒体数据,我们不仅需要获取其地理位置信息,往往还希望估计出相机朝向、拍摄场景的位置、几何结构信息等,以用于虚拟导航等应用。本文针对现存的多媒体地理位置标注技术中所存在的信息标注不完备、精确度不高等问题,提出了一种基于二维图像到三维场景匹配的视觉定位技术,获得准确而完备的图像地理位置标注信息。本文的研究内容主要集中在基于视觉的图像地理位置信息标注方法,分别在图像的完备地理位置标注信息的估计、地理位置标注技术的优化、以及地理位置标注技术的应用等方面做出了研究。本论文的主要工作和创新之处可以总结为以下几点:(1)论文提出一种基于二维图像特征到三维场景模型点匹配的图像地理位置精确标注技术。首先,通过图像聚类和三维重建得到各个地理位置的三维模型。对于用户输入的图像,通过大规模图像检索匹配到相应的图像和三维场景,最终将二维图像配准到三维模型,得到包括图像相机位置、相机朝向、图像所拍摄场景位置在内的完备的图像地理位置信息,并且具有较高的精度。同时,本文还深入探讨了在移动设备上对该系统的实现以及相关的移动应用,包括为用户提供了一种基于视觉的定位和自动导航应用,帮助用户更好地了解周围环境。(2)本文提出一种对于图像地理位置标注进行优化以提高标注精度的算法。首先,本文提出了具有地理位置区分能力的视觉词汇码本生成方法,利用图像数据库本身所含有的地理位置标注信息作为先验知识,得到视觉码本中各个视觉单词在地理位置上的分布信息,用以衡量视觉单词对于地理位置的区分性和描述力。通过将视觉单词的区分性和描述力隐含在视觉码本中,本文实现了更好的地理位置图像检索和定位结果。本文还通过对图像场景进行分析,来提取场景几何结构,从而实现对图像地理位置信息更加准确的标注,得到图像中建筑物的几何位置信息。(3)本文将地理位置标注技术应用到多媒体处理中,提出了一种利用互联网海量数据来指导图像修补的算法。首先,通过大规模图像地理位置标注技术,检索得到与目标图像拍摄同一场景,并具有相似视角的参考图像。从参考图像中提取信息传播到目标图像。论文详细地讨论和分析了图像中对于图像修补具有指导作用的几种结构信息,并且设计了从参考图像中检测和提出这几种结构信息的算法。最终,论文根据所提取的几种结构信息作为先验知识,实现了多种基于结构信息指导的图像修补算法,得到了具有良好的视觉效果并符合人类视觉系统感知特性的修补结果。所提图像修补算法不同于以往的仅仅只利用目标图像本身的信息或者依赖用户的人工交互输入信息算法,是一种基于数据驱动的算法。总而言之,本文针对互联网上海量多媒体数据的地理位置信息标注问题,研究如何为图像估计完备而准确的地理位置信息,对现有的地理位置信息标注技术进行优化,提升系统稳定性和准确性,以及对地理位置标注技术应用到多媒体的其它方面进行了思考和讨论,考虑了一系列新问题并提出了一系列的新方法,大量的实验和应用场景验证了所提出方法的有效性。

【Abstract】 With the rapid growth of techniques including computer science, electronic communication and multimedia technique, people can obtain information ans share them with other users on the Internet coviniently. The explosive growth of information on the Internet, brings abundant information resources for people. Image and video make up most of the internet traffic, thus the organizing and management of the large amount of multimedia data is one of the key problems that have draw lots of attention from both industry and academia. Geo-tagging, which aims to add geographical identification metadata to the multimedia data, can help users find a wide variety of location specific information. It is also benificial to the storage and visualization of these data. However, it becomes increasingly challenging to manage such an overwhelming amount of multimedia data. Not only the approximate position, but also other geographical information, including camera position, camera viewing orientation, the scene location and more specific geometric structure information, is needed for further application such as virtual navigation. In this paper, we propose a novel content-based localization approach which aligns the2D image to3D scene models to calculate the geographic information.In this paper we focus on technique about content-based image geo-tagging, including the estimation of comprehensive geographic parameters, the optimization of localization results and the applications in image inpainting with internet photos. The contribution of this thesis can be summarized as follows.Firstly, we propose a novel visual-based localization method that estimates the comprehensive geographic parameters of the given image.3D scene models are obtained by reconstruction from image clusters. For a given query image, similar images are retrived and then used to vote for related3D scene model. Finally the2D image is aligned to the3D scene model for localization. The estimated geographical parameters include the camera location, viewing direction and scene location. This comprehensive information can be used for mobile applications such as virtual navigation to help user get a better understanding of his surrounding.Secondly, we propose an optimization method to enhance the accuracy of geo-tagging. We propose a scheme to efficiently generate visual codebooks with strong discriminative power of different locations. Using the geo-tags of the database image as a prior knowledge, we calculate the geographic distribution of each visual word to measure their discriminative power. We get better location recognition performance with the proposed visual word weighting scheme. Furthermore, we propose to analyze the query image for more specific structure of the scene, leading to more precise geo-tagging of the image.Thirdly, we explore the application of geo-tagging in image processing. We present an image completion method that replaces a specified region of photographs using other reference photographs from Internet. We search candidate images that capture the same scene or building from the Internet using image geo-tagging. Then we establish geometric relationships between candidate images and the query image. The geometric relationships are represented by homography transformations estimated using viewpoint invariant local feature matches. Given these transformations, we can project the structure information from the candidate images to the target image. The extracted structure information includes line structures and region segmentation information, which are very helpful for image completion. Finally, we use such structure information for image inpainting to get fine-grained image completion results.In a nutshell, in this thesis, we explore and discuss techniques about geo-tagging for large amount multimedia data on the Internet from novel and distinctive perspectives and propose several applications based on geo-tagging. Compreshensive experiments demonstrate the effectiveness and efficiency of proposed algorithms.

节点文献中: