节点文献
基于扩频技术的时域音频水印的研究与改进
Research and Improvements on Spread-Spectrum-Based Audio Watermarking in Time Domain
【作者】 李力利;
【导师】 方向忠;
【作者基本信息】 上海交通大学 , 通信与信息系统, 2009, 博士
【摘要】 随着网络技术和多媒体压缩技术的发展,数字多媒体信号(文本、音频、视频和图像信号)可以轻易地被复制、传送和修改。这就使得加强信息安全和保护知识产权成了需要迫切解决的问题,数字水印技术就是其中的一种解决方案。数字音频水印技术通过轻微地改变原始数据以在音频信号中嵌入版权等方面的附加信息即水印。本文只讨论用于版权保护的,利用扩频技术在时域嵌入二进制图像数据的音频水印技术。该类水印的主要缺点是:嵌入率较低以及对同步(例如变时间和变调处理)等某些攻击的鲁棒性较差。本文在不对嵌入算法作大改动的前提下,主要在检测(提取)算法中作改进,以提高水印的鲁棒性。改进主要包括三个方面:采用自适应的判决阈值、利用后处理技术抵抗变时间攻击,以及采用基于高通滤波的白化处理抵抗变调攻击。传统的检测技术是将相关运算的结果与一个固定的阈值作比较以判断水印信息是“0”还是“1”,该阈值根据实验或经验确定。本文则提出了一种自适应选择阈值的方案,该方案对所有帧的相关结果进行统计分析,找出两个主分布区的分界点,以此作为该音频的最优阈值。实验结果表明,采用这种方式得到的阈值随着音频信号的不同以及攻击手段的不同而不同,尤其是攻击手段对阈值的影响最为明显。自适应阈值的引入不但提高了水印对一般攻击的健壮性,比如高阶低通滤波、重采样、重量化和MPEG压缩,也使得抗击变调攻击成为了可能。变时间不变调音效处理是通过有规律地复制或删除信号小片段以改变音频信号的长度。虽然水印同步遭到破坏,但局部数据段没有改变,所以相关检测仍然可以进行,只是音频数据量的改变会导致提取出的信息比特数量的改变,组合出的二进制图像也就可能因变形而无法辨认。本文提出一种简单易行的称作“后处理”的抗变时间攻击的措施,那就是通过内插和抽选两种手段的结合,实现对提取出的信息比特数量的缩放,使其恢复攻击前的尺寸。至于缩放比例的确定,则采取穷举搜索的方式,直到提取出的水印错误率最小,而搜索又分成粗搜索和细搜索两个步骤以提高搜索效率。实验结果表明,该方法对运算量和存储空间的要求不高,对抗击变时间攻击非常有效,当变时间的比例在0.3到2.0的范围内时,水印提取的比特错误率低于16%。变调不变时间音效处理是变时间处理和重采样处理的结合。由于重采样采用了对音频信号的均匀抽选和内插处理,所以不存在未被破坏的局部数据段,从理论上讲,无法通过相关检测提取出水印。但是本文根据大量的实验发现,相关运算前的最优线性预测误差滤波器是一个非线性相位的高通滤波器,只要将之替换成线性相位有限脉冲响应高通滤波器,再结合自适应的阈值方案,仍然能够成功地提取出被变调攻击的水印信息。实验结果表明,该方法对抗击变调攻击非常有效,当变调攻击的比例在0.7到2.3的范围内时,水印提取的比特错误率低于15%。考虑到高通滤波处理降低了水印对低通滤波类攻击的鲁棒性,所以最终的检测器是两种检测器(分别采用最优预测误差滤波器和线性相位高通滤波器)的综合,即对两种相关值取最大再作比较判决。另外,两种检测手段的结合使水印对其它攻击的鲁棒性也得到了进一步的提高。总之,本文提出的水印检测算法对几乎所有攻击都具有鲁棒性,再加之算法复杂度低,因此具有很好的实用价值。
【Abstract】 Recent developments on networks and multimedia compression techniques allow digital media to be copied, transmitted and edited conveniently. This makes enhancing the information security and preserving the intellectual property become urgent problems. One of the possible solutions is digital watermarking.Digital audio watermarking embeds additional information namely watermark such as copyright into the audio signal by making small modifications to the original data. This paper focuses on the audio watermarking for copyright protection which embeds the binary image data into the time domain of the audio signal based on the spread-spectrum technique. The main shortcomings of this watermarking are the low embedding rate and the lack of robustness to synchronization attacks (such as time-scale modification and pitch-scale modification). This paper presents some improvements in the detection (extraction) algorithm to increase the robustness without big altering to the embedding algorithm. The improvements involve: adaptive decision threshold, post-processing for resisting time-scale modification and whitening based on high pass filtering for resisting pitch-scale modification. The traditional detection techniques extract the information bits by comparing the correlation values against a fixed threshold, which is selected by experiments or experience. This paper proposes a scheme for adaptively selecting the threshold. By analyzing the distribution of the correlation values of all frames, the dividing point between the two distribution regions is chosen for the optimum threshold of this audio. The experiment results show that the optimum threshold varies with host audios and attacks, especially the latter. The adaptive threshold scheme not only makes the watermark more resistant to normal attacks, such as high order filtering, re-sampling, re-quantization and MPEG compression, but also makes it possibly robust against pitch-scale modification.Time-scale modification changes the duration of one audio signal by regularly duplicating or discarding small pieces of the original signal. Although the synchronization of watermark is damaged, the local data segments are not changed. So the correlation detection can still be carried out, except that the number of the extracted information bits is changed because of the changing of numbers of the audio samples, and the binary image represented by these information bits can not be identified because of warping. In this paper a simple and practical scheme called post-processing is propose to provide the resistance to time-scale modification. This scheme scales the extracted information bits to their original size by decimating and inserting. The scaling scale is the one corresponding to the minimum detection error found by exhaustively searching. The searching is implemented by a coarse searching and a fine searching to increase the efficiency. The experiment results show that this scheme has low requirement to computation and memory and is very efficient to time-scale modification. The detection error rate is lower than 16% when the time-scale modification scale is between 0.3 and 2.0.The pitch-scale modification is implemented by first time-scale modification and then re-sampling. Because of the equal spaced decimating and interpolation, there isn’t any unchanged data segment. In theory, the correlation detection can’t extract the watermark. But by a large number of experiments, we found that the optimum linear prediction error filter before the correlation is a non-linear phase high-pass filter and that the watermark under pitch-scale modification can be extracted successfully by substituting a linear phase high-pass finite impulse response filter for this optimum linear prediction error filter and by making use of the adaptive threshold. The experiment results show that this method is very efficient to pitch-scale modification. The detection error rate is lower than 15% when the pitch-scale modification scale is between 0.7 and 2.3. Considering that the high-pass filtering reduces the robustness to attacks like low-pass filtering, the final detector is the combination of the two detectors (adopting optimum prediction error filter and adopting linear phase high-pass filter respectively), that is to say, maximum of the two kinds of correlation values is taken for comparing. On the other hand, the robustness to other attacks is further improved by combing these two detection schemes.In short, the watermark detection algorithm proposed in this paper is very practical because of its robustness to almost all attacks and low computation complexity.
【Key words】 audio watermarking; spread-spectrum technique; psychoacoustic model; masking; time-scale modification; pitch-scale modification;