节点文献

立体图像和视频编辑的研究

Stereo Image and Video Editing

【作者】 晏涛

【导师】 黄刘生; 徐云;

【作者基本信息】 中国科学技术大学 , 计算机软件与理论, 2013, 博士

【摘要】 随着3D电影“阿凡达”获得巨大成功,立体图像和视频在最近几年变得越来越流行。一幅立体(3D)图像由两幅通常的2D图像构成,这两幅2D图像是在同一时间从两个稍微不同的视点拍摄同一个场景获得的。当一幅立体图像/视频显示在屏幕上面的时候,观看者通过佩戴合适的观看设备,可以使得左眼只看到左视点的图像,右眼也只能看到右视点的图像。视觉系统将同时获得的左右视点图像传递到大脑以后,人类的大脑能够融合这两个稍微有差别的图像从而计算出3D场景的景深信息。因为立体图像能够传递更多的视觉信息,并且显示效果更加逼真,立体图像和视频被认为是图像和视频未来发展的主要方向。虽然存在大量的算法和软件工具可以处理2D图像/视频,但是可以拿来处理立体图/视频的工具却非常少。处理立体图像/视频要比处理2D图像/视频更加困难,主要有三个原因。首先,获取精确和没有噪声的视差图/深度图比较困难。立体匹配算法尝试计算不同试点间像素的对应关系。虽然经过了多年的研究,但是效果仍然不理想,尤其是计算复杂自然场景的视差图则结果更差。即使我们采用深度相机来获得现实场景的深度图,产生高分辨率和没有噪声的深度图像还是比较困难。原因是现有的深度相机产生的深度图像的分辨率很低,而且相机本身比较笨重和昂贵。其次,编辑立体图像时,保证左右视点图像的一致性关系比较困难。结果立体图像中左右视点图像的一致性对于最大限度的减少图像失真和产生高质量的结果极为重要。实际处理中,左右视点图像常常需要同时进行处理以确保实验结果中左右图像的一致性,例如左右图像同时放在一个全局优化算法中进行处理。可见立体图像/视频处理算法通常要比2D图像/视频处理算法要复杂,并且需要较高的计算开销和内存开销。再次,我们需要保证立体视频相邻帧之间运动和深度的一直性,以消除结果视频中相邻帧之间可能存在的抖动问题。在这篇论文中,我们讨论立体图像编辑面临的深层次问题,尝试解决这些技术困难来提供高效的立体图像/视频编辑算法。在论文中,我们主要提供以下三个立体图像和视频编辑的方法。首先,我们提出一种新的立体视频深度调整方法。目前几乎所有3D电影拍摄时主要考虑要适合在影院的大屏幕上面播放,观众离屏幕有一定的距离,以此来计算目标视频的深度范围。如果在3D电视,电脑屏幕或者手机上面播放这样的立体视频时,视频原有的深度范围将会被大大削减,会严重影响视频观看时的立体效果。这不利于立体图像和视频在尺寸比较小的数码移动设备中的传播和欣赏。因此,我们提出一种线性的深度映射方法来调整立体视频的深度范围。我们的方法根据立体视频播放时的观看参数来计算立体视频放映时实际深度范围,比如屏幕尺寸和分辨率,观看者到屏幕的距离。同时考虑人眼的立体视觉特征,例如图像中物体间相对深度对于人眼深度感知的重要性,人眼对直线,平面发生扭曲敏感性。我们提出的方法能够最小化图像内容的失真,主要是通过保护图像中相邻特征点之间的相对深度,防止图像中直线和平面的扭曲。我们的方法能够保护立体视频包含的三维场景空间结构,使其不会因为图像深度范围发生改变而被损坏。我们的方法还保护立体视频相邻帧之间深度和运动的一致性。深度一致性确保立体图像中物体在相邻视频帧之间深度的改变是平滑的。运动一致性的目的是确保左右视点相邻视频序列中物体的运动都是比较平滑的。实验结果显示我们的方法提升了立体视频的立体效果,能输出高质量的实验结果,使得图像失真最小化。其次,为了得到高质量的立体图深度映射和其他立体图像编辑效果,我们尝试拓展shift-map算法使之可以用来编辑立体图像。我们使用一个全局优化方法,能够在像素级同时处理左右视点图像。我们的方法确保左右视点图像的一致性,并且保护图像传递的3D场景结构信息。另外,我们的方法还可以解决遮挡和去除遮挡的问题,这使得我们的方法有能力解决很多立体图像的编辑问题,例如立体图像深度映射,立体图像中物体深度的调整和非均匀的图像尺寸缩放等。实验结果证明我们的方法具备的各种立体图像编辑功能均能产生高质量结果。再次,我们提出一种可以生成无限立体全景图的方法。无限立体全景图是指通过拼接图片来生成全景图像,并且通过不断拼接立体图像使得使全景图的宽度可以不断的延伸。这些用来进行拼接的立体图像描述相类似的场景,但是可能是在不同地理位置拍摄得到的。无限立体全景图可以被用来产生虚拟现实中非常有趣的游走场景等。生成无限立体全景图的一个最重要的问题是如何无缝的拼接两幅立体图像。尽管存在非常多的2D图像拼接方法,这些方法可能无法处理立体图像,原因是保证视差一致性可能会比较困难。在论文中,我们提出一种拼接立体图像的方法。我们首先用图分割算法来找到一对接缝,沿着这条接缝我们可以分别拼接左右视点图像。在计算这对接缝时,我们尽可能地使得拼接以后接缝两侧内容比较平滑,抑制可能产生的视觉错误。然后我们采用一个基于图像形变的视差调整算法来进一步抑制接缝两侧的图像深度跃变。我们的方法可以生成高质量的无限立体全景图,实验结果证明了我们提出的方法的有效性。

【Abstract】 With the success of the3D movie "Avatar", stereo videos have become very pop-ular in recent years. In general, each stereo image contains two regular2D images captured from the same scene at the same time but from slightly different viewing loca-tions. When a stereo image/video is displayed on the screen, with appropriate devices, viewers see one2D regular image/frame with the left eye and the other with the right eye. The human brain will then fuse the two images/frames together to produce3D scene depth information. As stereo images can convey more visual information, stereo media are considered as one of the main research directions of future development.Although there are a lot of tools available for editing traditional2D imags/videos, tools for editing3D media are very limited. In general, editing and processing stereo images/videos are more difficult than those of2D images/videos, due to three major reasons. First, it is difficult to obtain noise-free and accurate disparity/depth maps for stereo images/videos. Stereo matching methods, which aim at finding correspondences between pixels in the left and right images, generally do not perform very well, espe-cially for stereo images of natural scenes. Even we use a depth camera, to obtain high resolution and noise-free depth maps from the low resolution and noisy output is still difficult. Second, it is difficult to ensure the spacial coherence between left and right images of stereo image pair, which is very important for minimizing distortion and producing high quality results. In practice, the left and right images usually need to be simultaneously processed in order to enforce the coherence between left and right images, such as processing by a global optimization. Thus, algorithms for processing stereo media are usually more complex than those for2D media, with high computa-tional and memory costs. Third, we need to ensure both motion and depth coherences across neighboring frames. In this thesis, our aim is to discuss fundamental problems existing in stereo image and video editing, at the same time attempts to address these technical difficulties and provides users with a number of editing methods for process-ing stereo images/videos. We mainly introduce three editing methods as follows. First, we propose a novel depth mapping method for stereo video depth mapping. Most stereo videos are developed primarily for viewing on large screens located at some distance away from the viewer. If we watch these videos on a small screen lo-cated near to us, the depth range of the videos will be seriously reduced, which can significantly degrade their3D effects. In order to address this problem, we propose a linear depth mapping method to adjust the depth range of a stereo video accord-ing to the viewing configuration, including pixel density and distance to the screen. We also consider characters of human binocular vision, such as relative depth among objects to depth perception, human eyes sensitivity to straight lines and planes. Our method tries to minimize the distortion of stereo image contents, by preserving the relationship of neighboring features and preventing line and plane bending. It also considers motion and depth coherences across neighboring frames. While depth co-herence ensures smooth changes of the depth field across frames, motion coherence ensures smooth content changes across frames. Our experimental results show that the proposed method can improve the stereoscopic effects while maintaining the quality of the output videos.Second, in order to obtain high quality depth mapping and other stereo editing effect, we extend the shift-map method for stereo image editing. Our method simulta-neously processes the left and right images on pixel level using a global optimization algorithm. It enforces photo consistence between the two images and preserves3D scene structures. It also addresses the occlusion and disocclusion problems, which may enable many stereo image editing functions, such as depth mapping, object depth adjustment and non-homogeneous image resizing. Our experimental results show that the proposed method produces high quality results with a number of editing functions.Third, we propose a method for creating infinite stereo panoramas. A stereo in-finite panorama is a panoramic image that can be infinitely extended by continuously stitching together stereo images that depict similar scenes, but may be taken from dif-ferent geographic locations. It can be used to create interesting walkthrough environ-ment. An important issue underlying this application is to seamlessly stitch two stereo images together. Although many methods have been proposed for stitching2D images, they may not work well on stereo images, due to the difficulty in ensuring disparity consistency. In this thesis, we propose a novel method to stitch two stereo images seamlessly. We first apply the graph cut algorithm to compute a seam for stitching, with a novel disparity-aware energy function to both ensure disparity continuity and suppress visual artifacts around the seam. We then apply a modified warping-based disparity scaling algorithm to suppress the seam in the depth domain. Our experimen-tal results show that the proposed stitching method is capable of producing high quality stereo infinite panoramas.

节点文献中: