节点文献

网格基单目和立体视频编码及相关技术研究

Research on Monocular and Stereo Video Coding Based on Mesh Model and Related Technologies

【作者】 郭大波

【导师】 卢朝阳;

【作者基本信息】 西安电子科技大学 , 通信与信息系统, 2009, 博士

【摘要】 当前,随着H.264/AVC技术的日益成熟,许多学者认为基于统计的去冗余的编码压缩潜力已不大,今后的视频编码技术应当从计算机视觉、计算机图形学和人类视觉系统中寻找答案。网格模型基视频编码技术是新型的视频编码技术之一,其技术的核心是利用计算机视觉和计算机图形学相关理论和算法,用结构化的数据表示图像序列。这项技术早在十年前在解决甚低码率视频通信问题时成为研究热点。网格模型基视频编码技术有许多问题亟待解决,如计算复杂度高、鲁棒性差,运动遮挡/视差遮挡问题还没有找到公认有效的方法,因此过去对它的研究基本上局限于简单背景和简单运动的应用场合,如视频会议。针对以上问题,本文所做的工作和主要贡献包括:1.以三角形元素:顶点、边、三角形及Delaunay三角形网格(DTM)为基本类建立了DTM生成算法的基本数据结构,使生成算法的速度在不降低描述精度的前提下提高了1/3。2.通过分析已有的两种DTM生成准则鲁棒性差和不稳定的原因,导出了一种灰度误差平方和最小化准则(MSSD准则),用于生成内容自适应的DTM。优化和改进了基于运动区域增长限制及节点接近性限制的经典网格生成算法---简称光流法,用于生成运动自适应网格。3.提出了基于网格节点跟踪的运动估计四步快速算法,考虑了运动遮挡区域的去节点和去遮挡区域的加节点问题,以保证节点跟踪的有效性。根据去点或加点数量或模型失效感知运动遮挡区域大小,然后根据遮拦区域大小构成自适应GOP结构。最后提出了一种适合复杂背景/复杂运动的网格基混合编码方案。实验结果显示,在H.263参考模型框架下搭建的网格编码实现上,对于复杂背景和复杂运动视频的压缩性能优于H.263高级运动模式。4.提出了一种基于块的最大后验概率(MAP)的立体视差估计算法,可在相关法和MSE法的基础上引入先验知识,从而更好地提高匹配性能。5.针对视频会议应用场合,提出了基于网格节点跟踪的视差估计四步快速算法,算法中考虑了视点间亮度补偿、全局遮挡边界检测,还考查了中间虚拟视点图像的合成。实验结果表明,由于算法的快速收敛性,使其在速度上和精度上均优于其它相应算法。此外,在合成中间虚拟视点图像时,网格在速度和算法简单性上也有一定的优势。6.为了显式地在视差图上标记出遮挡区域,在计算视差空间的基础上,利用动态规划算法搜索出最佳视差曲线。计算所得的视差曲线上有三种状态标记:匹配状态和两种遮挡状态。为了保证视差曲线通过路径控制点,提出了一种分段式动态规划算法。算法将视差空间影像划分为路径控制区和非路径控制区。在路径控制区强制路径通过路径控制点,在非路径控制区采用动态规划进行路径最优搜索。为保证路径控制点高度可靠,提出了选择路径控制点的4个准则。实验结果表明,新算法比传统的动态规划算法在遮挡检测和匹配精度上都有一定的提高,算法可靠性强,运算量小。

【Abstract】 Nowadays, with the increasingly maturity of H.264/AVC-based video coding techniques, many scholars believe that the statistics-based redundancy removal compression has tended to reach its limit. Future video coding techniques should find solutions in computer vision, computer graphics and human vision system. The mesh model-based video coding technique is one of new video coding techniques in which computer vision and computer graphics techniques is employed to represent image sequence in a structural way. It had been a research hotspot more than ten years ago for solving video communication problems in very low bitrates.The mesh model-based video coding techniques remain many problems to solve, such as its high computing complexity, its poor robustness, and that no effective solutions has not been found for the motion occlusion and the stereo occlusion problems. Previous studies in this field were only limited to simple background and simple motion applications, such as videoconference, etc.Aiming at the above problems, the main contributions in our works presented in the dissertation include:1. Base data structures are built for Delaunay triangular meshes (DTM) generation algorithm using triangular element classes, which include the vertex class, the segment class, the triangle class and the DTM class. They lead to speed increase in the mesh generation by one third without decreasing the approaching precision of DTM.2. Through the analysis of the poor robustness and the instability of two existed DTM generation algorithm, a new criterion termed with minimize sum of squared differences (MSSD criterion) in gray is derived for generating content adaptive DTM. The classical algorithm with nodal proximity constraints in temporal activity region, which is name as optic-flow method for short here, is optimized and improved for the generation of the motion adaptive DTM.3. A four-stage fast motion estimation algorithm is proposed based on nodal trajectories, in which nodes in motion occlusion region are removed and new nodes are added in uncovered background to guarantee effective nodal trajectories. In terms of the amount of added nodes or deleted nodes or mesh model failure, the sizes of regions to be occluded and uncovered is perceived, according to which adaptive GOP is constructed. Finally, a mesh-based hybrid video coding scheme is present. Experimental results show that the mesh-based video coding scheme outweighs the advanced motion estimation mode of H.263 in compression efficiency for complex background and motion videos in the mesh-based coding implementation built on H.263 reference model.4. A new algorithm for stereo disparity estimation by employing maximum a posteriori (MAP) criterion is proposed. It can introduce prior knowledge to the normalized correlation and MSE methods to increase matching performance.5. For videoconferencing applications, a fast four-stage disparity estimation algorithm based on nodal trajectory is proposed, in which illumination compensation between views and global occluded boundary region detection are studied. Furthermore, the virtual viewpoint synthesis is also investigated. Experimental results show in detail that the proposed algorithm overweighs other corresponding algorithms not only in speed but also in precision due to its fast convergence. In addition, meshes have advantages in speed and simpleness for the virtual viewpoint synthesis.6. In order to mark occluded regions explicitly on the disparity map, dynamic programming is employed to search optimal disparity curve on base of calculating disparity space at first. Each point on the optimal disparity curve must be in one of three states: matching state or other two occlusion states. To guarantee the disparity curve passing through ground control points (GCP), an algorithm of dynamic programming in segments is proposed, that is, the disparity space image is divided into ground control regions and non-ground control regions. In the ground control region, searching path is forced to pass GCPs. In the non-ground control region, optimal path searching is under dynamic programming. For the reliability of the GCP, four criterions are presented to choose a point as a GCP. Experimental results show that the new algorithm has certain enhancement in the precision of occlusion detection and matching, and is more reliable and faster than conventional dynamic algorithms.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络