节点文献

增量型目标跟踪关键技术研究

Research on Key Techniques of Incremental Object Tracking

【作者】 钱诚

【导师】 张三元;

【作者基本信息】 浙江大学 , 计算机科学与技术, 2011, 博士

【摘要】 在当今信息化社会中,视频监控、视频检索、人机交互以及视频编解码等各种与视频相关的应用不断涌现出来,与这些应用相随的是海量的视频数据,计算机视觉研究的发展则为这些海量视频数据的处理提供了技术支撑。借助于目标检测、跟踪等视频处理技术,计算机能够自动分析、识别目标物的运动状态与行为,使得视频数据的处理变得更为智能化。目前,经过人们在目标跟踪领域长期的研究,目标跟踪技术已经有了长足的发展,但是在复杂场景中对任意目标物进行稳定、准确的跟踪仍然是一件困难的事情。外观模型是目标跟踪算法框架中的核心组成部分,其根据从底层提取的图像信息对视频中可能的目标物图像区域进行判断和决策,因此直接影响着目标跟踪的性能。但是视点的改变、光照的变化以及部分遮挡等因素的存在都会引起目标跟踪误差的产生,甚至造成目标物的丢失。建立具有自适应性的外观模型是提高目标跟踪性能的关键,据此,本文从子空间学习方法与有监督学习方法出发,以增量的方式构造可在线更新的目标物外观模型,并在此基础上设计跟踪算法对视频中的目标物实施跟踪。本文的主要研究内容包括:(1)提出了基于加权增量子空间的目标跟踪方法。利用视频中的目标图像集合构造一个低维子空间,以此作为目标物的外观模型。在后续视频序列中采集图像样本,根据这些样本在子空间中的投影重构原图,由原图与重构图像之间的差值估算图像为目标区域的似然度,选取具有最大似然度的样本作为目标图像区域。根据当前帧中确定的目标图像样本,以增量的方式对目标外观模型进行更新,考虑到视频所具有的时序特性,赋予样本与时间相关的权值,以此构造一个更符合当前目标物外观变化的子空间外观模型,在视点、光照变化以及部分遮挡情况下的实验结果验证了该方法的稳定性与准确性。(2)提出了基于增量型非负矩阵分解的目标跟踪方法。采用非负矩阵分解方法对目标图像集合建立低维子空间描述,使得每个图像样本都可以由一组基图像线性表示,根据相邻视频帧中目标图像在坐标向量上的强相关性,可以由前一帧视频中己知的目标图像来确定当前视频帧中的目标区域,为此选取具有最大相关性的图像样本作为当前帧中的目标图像区域。在确定目标图像以后,以增量的方式更新基图像,以此完成子空间的在线更新,最后根据复杂场景中对刚性目标物与非刚性目标物跟踪的结果总结了该方法的特点。(3)提出了基于增量线性判别空间的目标跟踪方法。根据已有的目标类图像样本集与背景类图像样本集构造一维线性判别空间,后续视频中所采集的图像样本被投影到该判别空间中,样本投影与目标类投影均值之间的欧氏距离度量了图像样本为目标图像区域的似然度,具有最大似然度的图像样本就为当前帧中的目标区域,在完成跟踪以后,分别计算类别已知的图像样本类间散度矩阵与总体散度矩阵的充分生成集,在此基础上以增量的方式更新投影矩阵,使判别空间保持判别能力。实验结果表明,该方法能够对目标物进行仿射不变的跟踪。(4)提出了基于增量非对称Boosting的目标跟踪算法。根据Boosting算法构造强分类器,利用其对视频图像中所采集的图像样本进行判别,以此确定目标图像。设置了多个分类器池,在确定目标类样本与背景类样本以后,对分类器池中的每个弱分类器进行在线更新,从每个分类器池中选取具有最小分类误差的弱分类器来构造强分类器,并在线更新强分类器。此外,根据每个弱分类器的分类误差类型调整训练样本的权值分布,以此克服两类训练样本在数量上的非对称性,使得强分类器对视频中的目标物具有良好的检测能力。最后,对多类目标物的跟踪实验验证了该方法的泛化能力。

【Abstract】 Current society has already entered into information society. With the emergence of applications in video such as video surveillance, video retrieval, human-computer interaction and video encoder-decoder, there exists a huge mass of video data. The development of computer vision provides technical supports for video processing. Resorting to technology like object detection and tracking, computers are able to analyze and recognize motion states and behaviors of objects automatically, which makes video processing smarter. At present, thanks to efforts at long-term study of object tracking, the technology of object tracking makes great progress, but it is still difficult to track an arbitrary object in complex environments. As a core of framework for object tracking, appearance model has a direct influence on performances of object tracking. Based on raw information extracted from image sequences in the video, appearance model determines the possible image regions of the object in the video, but changes in viewpoints and illumination, occurences of partial occlusion all lead to deviations in object tracking, even that the object gets lost.It is key for object tracking to establishing an adaptive appearance model. Inspired by subspace learning and supervised learning, an online appearance model is contructed, which can be updated incrmentally. Several algorithms for object tracking based on the appearance model are devised to impliment tracking. The main contributions of the dissertation are summarised as follows:(1) A method for object tracking based on incremental weighted subspace learning was proposed. A low-dimensional subspace is constructed from a set of object image patches. In the subquent frames, a group of image patches are sampled, and then they are projected into the low-dimensional subspace. Image pathces are reconstructed from these projections in the subspace. Differences between original image patches and image patches reconstructed measure the likelihoods of their being the object image patch. The image patch with minimal difference is regarded as object region. Based on the new object image, the appearance model is updated incrementally. Taking time into consideration, each sample is assigned with a weight relevant to time, which makes subspace describe the appearance of the object more precisely. The experiments on the object tracking under viewpoint changes, variations in illumination and partial occlusion verify the steady and accuracy of the method.(2) A method for object tracking based on incremental non-negative matrix factorization was proposed. At first, non-negative matrix factorization is utilized to establish a low-dimensional subspace for describing appearances of objects, and each image sample is described as the linear combination of a set of image bases. Due to strong associations in the low-dimensional coordinates of object image patches, the object regions in the subsequent frames can be determined from the object regions in the previous frames. An image patch with the strongest association is selected as the object region in the current frame. After the determination of the object region, the image bases are updated incrementally in order to adjust the entire subspace online. Finally, the results of tracking the rigid object and non-rigid object prove the characteristics of the method.(3) A method for object tracking based on incrmental discriminative linear subspace was proposed. A set of image patches labelled as object and background are used to construct one-dimension discriminative subspace. Image patches in the subsequent frames are projected into the discriminative subspace, and the distance between the projection of each image patch and the centroid of the cluster of object images is computed. The distance measures the likelihood of each image patch as the object image. The image patch with the maximal likelihood is regarded as the object region in the current frame. After finishing tracking object, the spanning sets of the between-class scatter matrix and total scatter matrix are computed. Based on these sufficient spanning sets, the projection matrix is updated to maintain the discriminative power of the linear subspace. The experiments on object tracking show that the method can accomplish affination-invariant tracking.(4) A method for object tracking based on incremental asymmetric Boosting was proposed. A strong classifier is constructed based on the principle of Boosting algorithm, which contributes to classifications of the image patches sampled from the frames. Object regions are determined through the classification. To online update the appearance model of the object, pools is set up, each of which contains many weak classifiers. After identifying the object region in the current frame, every weak classifier is updated incementally. A weak classifier with minimal error is selected from a pool, and then all weak classifiers selected are combined to construct the strong classifier. Besides that, distribution of weights of training samples is adjusted to overcome asymmetry in quantity of the samples, which improves the detection of objects in the video. Finally, experiments on tracking different objects prove the generalization of the method.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2011年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络