节点文献
基于视觉显著性的网络丢包图像和视频的客观质量评估方法研究
Saliency Inspired Objective Quality Model on Packet-Loss-Impired Image and Video
【作者】 冯欣;
【作者基本信息】 重庆大学 , 计算机应用技术, 2011, 博士
【摘要】 随着网络视频在娱乐、教育、商业等方面越来越广泛的应用,高品质的视频压缩技术不断推陈出新,用户对高质量视频的需求也随之不断提高。然而,以网络为载体的视频数据除了会遭受有损压缩带来的量化编码失真,还会在传输过程中遭遇信道传输的拥塞或延迟而造成的数据包丢失。由于视频的压缩和解码中广泛采用了空间-时间运动估计技术,因此,视频数据包的丢失会严重影响解码后终端视频的观看质量。受网络丢包损伤的图像和视频具有独特的视觉特征,而目前的客观质量评估研究还主要集中在对压缩编码失真的评估。因此,对受丢包影响的图像和视频进行有效地评估,特别是建立一种符合人眼视觉感知特性的客观质量评估方法,对于网络服务的设计和监控具有重要的意义。本研究主要面向受网络丢包损伤的图像和视频,从研究丢包失真图像和视频的视觉特征出发,结合人眼视觉的选择性注意特性,提出了基于视觉显著性的全参考客观质量评估模型。本文的主要研究内容及贡献如下:在对网络丢包失真图像和视频的空间-时间视觉特征进行分析的基础上,本文提出结合人眼视觉系统的显著性视觉注意特性来评估网络丢包失真图像和视频的感知质量。图像和视频的视觉显著注意信息通过采用Itti的自底向上显著区域检测模型得到。对于动态视频的应用,本文将运动作为另一种显著特征,通过结合生物启发式HR运动感知检测模型,在Itti的显著区域检测模型中实现了基于生物相关的HR多尺度显著运动感知模型。本文分别针对网络丢包图像和视频构建了两个数据库。原始视频采用17个来自美国某视频研究所的标准视频序列,并模拟网络丢包事件构造了受丢包损伤的重建图像和视频序列。为了排除长视频序列中由于丢包所在位置、长度、以及宽恕效应等因素的影响,使得视觉显著性信息能够充分反映网络丢包视频的视觉特性,每个测试视频为只含有单个丢包事件的短视频序列。对于每一个数据库,本文都按照ITU-R BT500-11标准严格实施了单激励主观质量评价实验,其结果为论文的客观质量方法提供了评价标准。通过探索人眼视觉系统自底向上以图像数据为驱动的预注意机制原理,根据网络丢包损伤图像/视频的空间视觉特征提出假设:丢包引入的失真在视觉显著注意区域比出现在背景等非视觉注意区域更影响图像/视频质量。借鉴HVS对失真信息进行加权的质量评价思想,论文首次将视觉显著注意信息应用到受网络丢包损伤的图像/视频,提出一组基于视觉显著失真的全参考客观质量评估方法。通过探索人眼自顶向下以先验认知为指导的高级视觉注意机制原理,根据网络丢包损伤图像/视频的空间-时间视觉特征提出假设:人眼总是容易被一些突然引入的异常事件,或者局部异常的区域所吸引;这是由于这些事件或区域与人眼在先验感知指导下的期望注意区域产生了差异,而因此导致了注意视线的改变。通过考察网络丢包失真图像/视频与参考图像/视频相比在空间上引起的视觉注意变化,以及在视频时间域上引起的不同幅度的视觉注意变化,并根据这些变化相应的视觉显著性在空间上的差异和在时间上的变化幅度,本文创新地提出基于视觉空间-时间显著性变化的全参考客观质量评估方法。在对以上提出的两类视觉显著性质量评估方法进行综合比较后,论文对所有的质量评估方法进行了最佳单调映射变换,并将其作为构建统一评估模型的评价因子。通过应用逐步线性回归分析以及交叉验证方法,本文分别针对网络丢包单个图像以及视频序列构建了基于视觉显著信息的线性评估模型。通过与传统的没有考虑视觉显著信息的客观质量评估模型以及标准的视频评估模型对比,实验结果表明,本文提出的基于视觉显著性的客观质量评估模型能够有效地评价网络丢包损伤图像和视频的感知质量,视觉显著性信息是构建面向网络丢包损伤图像和视频客观质量评估方法的有效且重要的视觉信息。本论文为基于视觉选择性注意的图像和视频质量评价研究的发展开辟了新的思路,为人类视觉感知信息的探索和应用提供了一些有意义的参考。
【Abstract】 The growing popularity of network video with applications as abroad as entertainment, education and business, advances the rapid development of high definition video compression technology, and further increases the demand of end users for higher quality videos. However, in many networked video applications, the videos may be not only distorted by the quantization in the compression process, but also be corrupted during transmission due to either physical channel bit errors, or congestion and delay, etc. Ultimately, various types of channel impairments all lead to losses of video packets. Because of the use the spatial-temporal motion compensation in video coding and decoding, received videos affected by packet loss may suffer from severe quality distortion. Packet-loss-impaired image or video has its own visual feature, but most of the quality metrics developed so far are concerned with the artifacts induced by lossy image/video coders. Hence, being able to quantify the quality of packet-loss-impaired image and video, especially building an objective assessment method agreed with human perceptual quality, is very important for network service design and provision.This thesis aims at objectively evaluating the perceptual quality of packet-loss-impaired image and video. Based on the visual feature of image and video affected by packet loss, we explore the selective visual attention of human vision system and the application of visual saliency information in quality assessment. We finally propose a saliency based full-reference objective quality assessment model. The major contributions of this thesis are:Based on the analysis of spatial-temporal visual feature of packet-loss-impaired image and video, we propose to use saliency based visual attention of human vision system to evaluate the perceptual quality of packet-loss-impaired image and video. Visual saliency is determined by Itti’s bottom-up based saliency detection model. For the application of dynamic video, we extend the saliency estimation by integrating motion information as a salient feature, and a multi-scale biology inspired HR motion detector is implemented in Itti’s saliency detection model.We construct two databases for packet-loss-impaired images and videos repectively. 17 original videos are selected from the standard video database of a video research institute in the USA. A simulation procedure is conducted on each original video to get the packet-loss-impaired image and video. In order to better investigate the fundamental visual saliency information for loss affected videos, we conduct sequences that only contain single packet loss event. We have recognized that its final impact on the quality of a longer video depends on the error location (forgiveness effect), length, and severity. Ground truth for evaluating the performance of objective quality metric is obtained by carrying out subjective test for each database following the ITU-R BT500-11 recommendation.We first investigate the principle of bottom-up based image-driven pre-attentive stage of human vision system, and based on the hypothesis that a packet loss induced error that appears in a saliency region is much more annoying than a distortion happening in an inconspicuous area, a category of saliency weighted pixel-error full-reference objective quality metrics are proposed. Although the general idea of saliency weighted quality assessment is similar with some prior works, we are the first to demonstrate the merit of saliency information on evaluating the perceived quality of packet-loss-impaired image/video.We then explore the top-down knowledge-driven high level visual attentive stage of human vision system, and find that human eyes tend to be attracted by some unexpected events or local abnormal regions. This is because these regions are different from the knowledge guided attention regions, and attention scanpaths would change correspondingly. Since packet loss induced artifacts have the similar visual feature, we explore the spatial changes in the saliency values between the original and distorted image and video, and the temporal variation of the saliency map of the distorted video, and finally propose a novel category of saliency sptial-temporal variation based quality metrics.We make general comparison of the two categories of saliency based qulity metric by evaluating their correlation with subjective ratings. Each metric is then transformed with an appropriate non-linear mapping to be an evaluation factor. Our final proposed saliency-based video quality model (S-VQM) linearly combines a subset of all considered evaluation factors (including non-saliency factors and saliency related factors). The factors included and the weights for the chosen factors are determined using a stepwise linear regression process. Comparison results of the traditional non-saliency based quality model and the standard video quality model demonstrate that S-VQM provides significant improvement in correlation with subjective data and prediction accuracy. Our work shows that considering saliency information can provide substantial improvement in assessing the perceptual quality of packet-loss-impaired image and video.This thesis brings us a new prospect of visual selective attention based objective quality assessment for image and video. It also provides some meaningful reference in methodology for investigation and application of human visual percetual information.