节点文献

视频网络传输中面向对象处理的关键技术研究

Research on Key Technologies of Object-oriented Processing in Video Network Transmission

【作者】 符祥

【导师】 郭宝龙;

【作者基本信息】 西安电子科技大学 , 电路与系统, 2008, 博士

【摘要】 随着各领域对数字视频需求的日益增加,视频的网络传输越来越受到人们的重视,面向视频对象的编码和处理技术成为研究的亮点和热点。例如以MPEG-4为代表的第二代压缩编码技术,其核心内容是基于对象的可伸缩编码技术,随之出现的面向视频对象的相关处理技术为高效编码、正确解码和获得高质量的视频输出提供了保证。因此,本文主要对面向视频对象的关键处理技术进行了研究,以获得高质量的基于网络的视频服务,主要涉及视频对象分割技术、视频对象形状错误隐藏技术和基于视频对象的插值技术。首先,准确的视频对象分割有助于提高编码效率、获得高质量的视频;其次,由于网络不可避免的传输错误,好的形状错误隐藏技术是正确解码和提高视频输出质量的关键;最后,由于用户端显示设备和显示方式的多样性,采用插值技术实现图像分辨率变换,获得高质量的显示效果,有很好的实用价值。论文的主要内容和贡献如下:1.为了克服传统基于帧差的视频对象分割方法的不足,提出了一种新的视频对象分割算法(CCHVS)。依据在HVC颜色空间中,两种颜色间差异的度量与人类视觉感受具有一致性这一原理,对基于帧差的运动检测方法进行改进,增加了运动检测的稳定性及对噪声和光照变化的鲁棒性;利用当前帧与恢复的背景图像提取视频对象,使得视频对象的轮廓更完整,对快速运动对象和多对象分割也具有较好的效果。其次,为了适应面向视频对象处理的需要,提出了一种基于视频对象的区域分割算法(RSVO)。2.为了提高视频对象分割的处理速度和节省内存资源,研究了颜色量化技术。针对传统八叉树颜色量化算法运算速度慢、占用内存多的不足,提出了一种改进的八叉树颜色量化算法(MOCQ)。限定八叉树的高度为4层,可以节省大量存储空间;采用先从上向下统计、再从下而上合并的顺序合并节点,避开了数目庞大的叶节点,能节省大量处理时间;运用误差扩散技术对颜色量化误差进行修正,提高了图像质量。3.在研究空域视频对象形状错误隐藏技术的基础上,为了克服传统基于Bézier插值空域法的不足,即确定附加控制点较复杂,隐藏结果受控制点位置影响等,提出了一种基于三次B样条插值(CBI)的空域视频对象形状错误隐藏算法。其中,为了克服传统样条生成插值曲线时反算控制顶点,计算量大和局部修改不方便等不足,推导出一种简单实现CBI的矩阵公式。将该公式应用到空域形状错误隐藏中,直接利用已知轮廓点进行插值,不必增加附加控制点,从而使错误隐藏的过程简单易实现。4.传统时域视频对象形状错误隐藏技术仅适用于相邻帧视频对象间运动较械那樾?针对对象间具有较大旋转和平移的情况,提出了一种旋转和平移鲁棒的时域视频对象形状错误隐藏(TRRT)算法。基于Harris角检测器和局部Zernike矩的旋转和平移不变性,对相邻对象进行特征匹配,匹配时引入纹理信息,相对于仅使用对象的二值形状平面,可增加匹配的鲁棒性;将参考对象的轮廓进行运动补偿,保证用最相似的形状隐藏丢失的形状信息,使得当对象间具有任意平移和旋转运动时,都能得到较好的错误隐藏结果。5.为了克服传统图像插值方法由于边缘点所属区域不明确,模糊的处理造成图像模糊和客观质量下降的不足,提出了一种基于视频对象和区域指导的图像插值(ORD)算法。首先,利用RSVO算法进行区域分割,结合近邻法和众数法明确判断待插值点所属区域。插值公式的设计以区域的一致性为指导:对区域内部的点采用线性插值方法,保持区域内部的平滑性;对区域间的过渡点,设计非线性插值公式,给同一区域的邻域像素赋较大的权值,给其它区域的邻域像素赋较小的权值。其次,ORD算法插值时可只在感兴趣的对象内采用基于区域指导的方法,而对背景和其它对象区域采用简单、快速的线性方法,保证较快的处理速度和兴趣区域较好的图像质量。将ORD算法用于图像放大和图像缩小中,结果图像有较高的主观视觉效果,同时提高了图像的客观质量;将此算法进行改进,应用于激光水下目标放大中,同样取得了较好效果。6.提出了一种异构环境下的分布式视频监控系统框架,并通过实例介绍了在该框架下实现视频监控系统的具体过程和方法。将运动目标分割、传输错误隐藏及插值技术运用到该系统中,获得了较好的视频质量。该系统具有的主要特点包括:自动跟踪运动目标,调节摄像机参数;方便增加新功能和增加新监控点;打破了距离和空间的限制,有Internet和手机信号的地方就能实现视频监控;硬件设备简单,成本低;可以无缝过渡到3G系统等。

【Abstract】 With the increasing need of digital video, video transmission over network has received more and more attention, and object-oriented coding and processing has become a research hot spot. The object-based coding is the core content of MPEG-4, which represents the second generation video coding standard. Some object-based processing technologies are used to guarantee efficient encoding, correct decoding and high quality video outputting. In order to obtain high quality video service, the dissertation studied some of the key processing techniques in network video transport application, which include video object segmentation, shape error concealment and spatial resolution transformation, then, they are used in distributed video surveillance system in heterogeneous environment. The main works are as follows:1. Firstly, according to the characteristic of Human Vision System (HVS), an automatic video objects segmentation method based on the Color Consistency of HVS (CCHVS) is presented. CCHVS obtains the frame difference mask based on human perception, this motion detection method is more effective than traditional ones. The proposed algorithm can handle with complex scenes such as fast moving object and multiple objects and so on efficiently because the moving object is separated by comparing the current frame with the reliable background image. Secondly, in order to adapt to the requirement of MPEG-4 object-oriented processing, a Region Segmentation method based on Video Object (RSVO) is proposed. The mean shift process can be performed in the area of video object. RSVO can speed up calculating time and save memory than traditional mean shift method, and is suitable for situation where high speed is needed and memory resource is restricted.2. A modified octree color quantization algorithm (MOCQ) is proposed. It limits the depth of the octree to 4 to save memory. And adopts a bidirectional pruning mechanism of first up-bottom comparing then bottom-up pruning directly to avoid the large numbers of leaves and improve processing speed. An error diffusion method is used to obtain better image quality.3. Based on Cubic B-spline Interpolation (CBI), a spatial shape error concealment method is proposed. Firstly, to avoid the deficiencies of traditional B-spline interpolation methods that computationally expensive and inconvenient to local modification, a matrix form representation for CBI curve is presented. Then, the matrix form representation is used to shape error concealment. Compared with traditional spatial methods based on Bezier interpolation, the one in this paper generating interpolating curve based on the right received boundary points directly and without inserting any additional control points. At the same time, our method can be implemented simply.4. Based on the rotation and translation invariant properties of both Harris interest point detector and local Zernike moments, a Temporal shape error concealment scheme Robust to Rotation and Translation (TRRT) is proposed. Firstly, to improve the shape motion estimation accuracy, not only the binary alpha shape plane of VO, but also the texture data will be used. Then, the interest points are detected by Harris interest point detector, and the best matching pairs of interest points between two objects are computed by comparing the Euclidean distance of local Zernike moments defined on the interest point neighborhood. The global motion parameters are determined and the previous boundary is motion compensated. Finally, the missing boundary pieces are reconstructed based on the most similar part in the motion compensated boundary. TRRT is robust to rotation and translation movements between objects in consecutive time instants.5. A video Object and Region Directed image interpolation method (ORD) is proposed. Firstly, the scientificity of image interpolation based on uniformity of region is analyzed. Then, image is segmented using RSVO method, and which region an interpolated pixel should belong to is decided by an approach combines the method of the nearest neighbor and the statistical mode. The procedure of interpolation formulas design fully shows the uniformity of region. For pixels within a region, linear interpolation methods are used to keep the smoothness of the region. And for transition pixels between different regions, nonlinear interpolation formulas are designed. Bigger weights are assigned to neighboring pixels that have larger contributions to calculate the interpolated point value. In order to meet the requirement of MPEG-4 object-oriented applications, the region directed processes can be implemented in the area of the object of interest only, while faster and simpler linear method is chosen in other areas. This can save resources while guarantee high quality for the region of interest. Experimental results show ORD can obtain images with higher subjective and objective quality than traditional methods for both up-sampling and down-sampling applications. It obtains good results when ORD is used in underwater laser image enlargement.6. A framework for distributed video surveillance in heterogeneous environment is proposed, and the feasibility of it is demonstrated with a prototype implementation. The performance of the system is improved for those key techniques, which include moving object segmentation, transmission error concealment and image interpolation. The main characteristics of the proposed system are as follows: can be configured remotely to track moving object and adjust the camera parameters automatically; can increase new functions or add new monitoring nodes easily; surveillance can be performed wherever there is internet or mobile telephone signal; the system is cheaper and easier to achieve with simple equipments, so it can be widely used in practice; and can be extended to a third generation (3G) system seamlessly.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络