节点文献

基于Latent SVM的人体目标检测与跟踪方法研究

Study on Human Target Detection and Tracking Based on Latent SVM

【作者】 胡振邦

【导师】 蔡之华;

【作者基本信息】 中国地质大学 , 地学信息工程, 2013, 博士

【摘要】 人体目标检测与跟踪算法是计算机视觉的研究热点,其在智能交通、城市安防、人机交互、智能机器人、视频图像分析和电子娱乐等方面都有着广泛的应用。近年来,随着城市物联网的发展,人体目标检测与跟踪算法得到了越来越多的关注。本文将人体目标检测与跟踪问题分解为图像检测、背景分离、图像跟踪与路径优化等四个方面展开讨论。在图像检测方面,本文的主要研究方法是Latent SVM图像检测方法。在Latent SVM的模型训练过程中,首先对训练样本进行聚类,然后根据样本的聚类结果构建多视角的观测模型。另外,不同于一般的SVM模式识别算法,Latent SVM在一般SVM模型的基础上自动生成隐变量特征并添加到模型。隐变量具有位移和外观双重属性,因此在人体目标检测问题中,训练获取的Latent SVM检测模型可包含多个隐变量。由于隐变量可以近似地理解为人体的局部外观特征,如头部、躯体和四肢等。多模型的构建与隐变量的自动生成使得Latent SVM成为目前最好的图像检测算法之一。本文对Latent SVM方法的研究包括训练模型与检测流程两个方面。背景分离算法是视频图像检测与跟踪系统中的一种辅助方法。该方法可以提升整个系统的处理速度和精度。背景分离算法的作用是有效地去除视频图像序列的背景区域。背景部分是指图像序列中相对静止的部分,例如摇曳的树叶、转动的风扇以及移动目标的阴影等部分。在使用Latent SVM进行图像检测时,需要计算整张图像的梯度方向直方图(HOG)特征金字塔图。由于目标检测耗时与扫描区域成倍增长,因此对于一些实时性要求较高的视频图像检测系统,可以利用时间上相邻的图像相关信息快速剔除掉大部分的背景区域,减小扫描区域以提升检测速度。由于仅在前景移动区域内检测,因此该方法能极大地减小误检率。本文先后介绍了多种背景分离算法,如多元高斯混合模型算法、编码法、以及自组织映射背景分离算法。综合这些方法的优点,本文提出了改进算法。与图像检测算法相比较,图像跟踪除了需要确定跟踪目标的位置,还需要画出跟踪目标的移动轨迹。此外,图像跟踪算法还具有一定的连续性和自动性,能够弥补图像检测算法中的一些遗漏检测。本文提出了一种改进的Mean-Shift图像跟踪算法。该算法与背景分离算法结合,能够精确定位目标的跟踪位置并提升处理速度,且仅需要带入初始跟踪区域即可自动完成图像跟踪。当跟踪目标的位置发生堆叠或遮挡时,图像跟踪算法不可避免地会产生跟踪错误或丢失。跟踪路径优化算法就是要消除这些错误。结合初始跟踪对象的位置、外观信息,可以通过设计优化函数进行函数优化,从而实现多目标跟踪轨迹优化。本文提出一种改进的基于可逆转跳变马尔科夫链的蒙特卡洛优化多目标跟踪算法(reversible jump Markov chain Monte Carlo-RJMCMC)能够在较高检测正确率的情况下有效地对初始跟踪路径进行优化。初始目标跟踪轨迹由图像跟踪算法获取。在使用跟踪路径优化算法之前,首先需使用背景分离算法获取前景移动区域,然后分别对场景中的前景区域进行Latent SVM检测并对检测结果进行验证获得跟踪对象。这种处理方式能够最大化地减少误检率,并极大地简化了跟踪路径优化问题。综上所述,本文设计并尝试实现了一套完整的人体目标检测与跟踪方案,并针对各个组成模块的缺陷与不足进行了改进研究。本文的主要工作概括如下:1) Latent SVM模型训练算法的改进。由于人体目标图像具有多变性,因此隐变量的自动生成对模型的训练至关重要。在原始的Latent SVM模型训练方法中,首先根据样本图像的HOG特征由SVM算法获得简单的分类模板,然后再对分类模板使用贪心算法自动生成隐变量。为了获得更好的训练模型,本文提出一种结合Mean-shift与差分演化的图像分割算法自动生成隐变量特征。本文提出的新方法综合考虑了样本集图像的纹理分布特性自动搜索局部特征隐变量,从而获得更好的检测模型并最终提升检测性能。2) Latent SVM人体目标检测算法的改进。原始Latent SVM图像检测算法在进行目标检测时首先需构建待检测图像的HOG特征金字塔,然后将检测模型与HOG特征金字塔分别进行卷积运算,最后通过卷积得分与金字塔层数确定检测目标位置。级联Latent SVM图像检测算法是在原始Latent SVM图像检测算法的基础上进行的改进。首先使用PCA对样本集HOG特征进行分析,同时对检测模型与待检测图像的HOG特征金字塔进行降维。采用级联Latent SVM进行目标检测,再将降维后的检测模型和特征金字塔进行卷积,然后仅选择大于指定阈值的特定位置进行后续判断,即后续判断则是对原始检测模型和特征金字塔的卷积得分进行判断。级联Latent SVM方法的优点是能够快速的过滤掉图像中的非人体目标。为了进一步提升级联方法的性能,本文提出使用LDA方法分析样本集HOG特征并获得降维向量。另外,改进了级联Latent SVM提出一种改进的隐变量局部搜索策略,最后提出对隐变量进行网格颜色相似性特征提取,并建模对检测结果进行2次判定,以降低检测虚警率。3)提出一种新的自组织映射背景分离算法。经典的多元高斯混合模型及其改进的编码法都以图像中的每一个像素点为基本处理单元,这类方法中的相邻像素间没有任何相关处理,难以适应场景中存在的变化,分离结果具有较大的虚警。而自组织映射有效地解决了场景中各个像素之间的信息关联,并对场景具有较好的适应能力。但是该方法编码长度固定,需要人工干预指定编码长度,当场景突变时无法对码本进行即时修改。为此,本文提出一种结合编码法与自组织映射将相邻像素进行关联的新方法,该方法中每一个像素的背景编码长度能够根据具体情况进行自动变换,最后该方法为基础对红外线数据和彩色影像数据进行了融合,并对阴影进行了有效的去除。4)将Mean Shift算法与背景分离算法相结合给出了图像序列中多个移动目标—行人跟踪的新算法。结合背景分离算法对经典的Mean Shift算法进行了两点改进:第一,提取移动目标的有效区域,然后使用特征向量的相关函数作为跟踪对象的定位标准;第二,结合背景分离结果对跟踪区域进行快速修正。在多目标跟踪问题中,与经典的Mean-Shift算法相比,改进算法在耗时、鲁棒性和跟踪精度方面均有更好的性能。5)提出一种改进多目标跟踪路径优化算法。即使在极高检测正确率的情况下,当跟踪目标的位置发生堆叠或遮挡时,图像跟踪算法不可避免地会产生跟踪错误或丢失。本文设计的人体目标检测与跟踪系统首先使用背景分离算法获取前景移动区域,再采用级联Latent SVM进行目标检测,再由颜色相似性判定分类获取检测结果集合:最后将检测结果集合与本文提出的改进的Mean Shift算法结合获取初始跟踪轨迹。此时的初始跟踪结果能够确保极高的跟踪位置精度。针对这种情况,本文提出的改进的多目标跟踪路径优化算法对一般的优化算法进行了简化,主要包括优化公式的简化与优化策略的简化。简化后的优化算法不再采用裁剪、增长、添加、移出等策略对跟踪对象的移动轨迹进行等优化,仅采用分割、合并策略对跟踪轨迹进行优化。综上所述,本文概述并分析了人体目标检测与跟踪相关算法,并指出了各组成模块的不足。重点研究了基于Latent SVM的模型优化算法、级联Latent SVM图像检测算法、自组织映射背景分离算法、Mean-Shift图像跟踪算法、RJMCMC多目标跟踪优化算法。在研究中通过试验证明了各方法的有效性。

【Abstract】 Human target detection and tracking is an important research area with many applications such as Intelligent Transportation System, intelligent video surveillance, advanced human-machine interface, intelligent robot, video analysis and electronic entertainment. With the development of the Internet of Things, the system to solve human target detection and tracking is more popular.In this thesis, the problem is decomposed into image detection, background subtraction, image tracking and path optimization which are discussed separately.Latent SVM (support vector machine) is the major research method for image detection in this thesis. In the training process of Latent SVM, the training image samples are clustered into different subsets according to the aspect ratio. To adapt to the different observation angle of view, different component of the model are trained from these subsets synthetically. Besides this, a futher analysis is adaptived for the SVM model to automaticly extract latent variable. Latent SVM is better and more complicate than general SVM algorithm. The latent variable has displacement and appearance information. A Latent SVM model of human can has dozens of latent variable. These latent variables could be considered as locally hunman object characters, such as head, body, arms and legs.The muil-component and latent variables make the Latent SVM be the one of the best image detection algorithms. In this thesis, the research about Latent SVM involved bath training and detection progress.As an assisted method, background subtraction algorithm could be utilized in the system to improve the accuracy and speed. The goal of background separation algorithm is effectively remove the background area of the video image sequence such as tresss waving in the wind, fans turning, shadow of the moving target, et cetera. The histograms of oriented gradients (HOG) pyramid of the detected images are utilized in the Latent SVM to detect the human target. Detection time consuming is increasing with the scanning area. For the real-time video image detection system, information of the time adjacent images could be utilized to quickly remove out most of the background region and improve detection speed. The scanning area is only focus on the foreground moving region, and the detection error rate of is greatly reduced. In this thesis, a variety background subtraction algorithm is introduced, such as multivariate gaussian mixture model algorithm, coding method, and self-organizing mapping background separation algorithm. Integrated the advantages of these methods, an improved algorithm is proposed.Compared with the image detection algorithm, image tracking algorithm needs to find out the location of the target and the track of the target movement path. Image tracking algorithm has certain continuity and automaticity, and be able to make up some missing detection of image detection algorithm. An improved Mean-Shift image tracking algorithm is introduced in this thesis. The improved algorithm is combined with background subtraction result which could help to get a better tracking position and speed. Inputing the initial track area, the algorithm can automatically complete tracking task.When the tracking target conceal by the other target or background, image tracking algorithm would inevitably produce tracking error or missing.Tracking path optimization algorithm could help to make up this problem. Combined with location and appearance information of the tracking targets, a function could be desiged and optimal tracking path. An improved reversible jump Markov chain Monte Carlo-RJMCMC algorithm is introduced in this thesis. The improved algorithm can get better performance in high detection accuracy. Before using tracking path optimization algorithm, the first step is the background separation to get foreground region, then Latent SVM is applyed to detect human targets in the foreground area. And finally, the initial tracking path is obtained from image tracking algorithm. These processes can maximize reduce error detection rate, and greatly simplifies the tracking path optimization problem.In this thesis, complete human target detection and tracking system prototype is designed. Consider the defect and deficiency in each module, an improvement research is conducted. The main contributions of this thesis are as follows.1) Latent SVM model training algorithm is improved. Human target images have large variability and the initialization of latent variables in the latent SVM model is very important. In the original Latent SVM model training process, a simple classification template is trained from sample HOG features by SVM algorithm. Then a greedy algorithm is conducted with the classification template to obtain the hidden variables. In order to get a better training model, a new image segmentation algorithm based on Mean-Shift and differential evolution algorithm is proposed to generate better latent variable. The propose method taking into account texture distribution feature of the positive sample set image and automatically search local characteristics of hidden variables for a better detection model and performance.2) The detection algorithm of Latent SVM is improved. The HOG pyramid of the detected images are utilized in the Latent SVM to detect the human target.The original detection algorithm of Latent SVM is conducted by convolution of the HOG pyramid and Latent SVM model. And the location and size of the target can be obtained from the convolution score and pyramid level.The cascade Latent SVM detection algorithm is a fast detection algorithm based on original algorithm. Firstly, PCA is applied to the sample set HOG features, and a dimension reduced HOG pyramid is obtained.The cascade Latent SVM detection algorithm utilize the dimension reduced HOG pyramid to make convolution with the Latent SVM model and only select appropriate loations with large convolution scores for further analysis.And the further analysis is the original Latent SVM detection within the special location. The cascade Latent SVM detection algorithm could fast filter out none human target area within the image.To further improve the performance the cascade Latent SVM detection algorithm, a dimension reduction algorithm based on LDA is introduced in this thesis. Besides, a new latent variables locally search algorithm is also introduced. Finally, in order to reduce the detection false alarm rate, a second decision model based on color similarity of the latent variables is constructed.3) An improved self-organizing map background separation algorithm is proposed. Classic multivariate gaussian mixture model and coding method both treat each pixel of the image as the basic processing unit. This kind of methods did not make a correlation processing between adjacent pixels, and sometimes can not well adapt to the changing of the scene, and the separation result may has high false alarm. Self-organizing map can effectively solve the information connection between each pixel in the scene, and has a better ability to adapt the scence changing. However, this method requires manual intervention to fixed code length and the code book can not modified according to the scene mutation. A new algorithm combined with coding method and self-organizing mapping is proposed in this thesis which builds association of adjacent pixels and enable variable-length code book. Finally, a improved algorithm based on this algorithm and SVM is introduced and applied in the fusion of infrared and color image data to effectively remove shadow.4) Mean-Shift and background subtraction algorithm are used together to track multiple people in image sequences. It has tow contributions. Firstly, the moving targets area is extracted effectively and the feature vector correlation value is utilized as the measure for the tracking accuracy. Secondly, a fast region modify progress is conducted based on the foreground-background segmentation result. The improved algorithm has better performance in terms of time consuming, robust and tracking accuracy than the conventional mean shift algorithm.5) An improved tracking path optimal algorithm is introduced is this thesis. When the tracking target conceal by the other target or background, image tracking algorithm would inevitably produce tracking error or missing. The human target detection and tracking system firstly utilized the background subtraction algorithm get the foreground area. Then, an improved cascade Latent SVM algorithm is implemented in the foreground area to get the human targets and checked by the color similarity model. The initial tracking path is obtained from the detection results and Mean Shift algorithm. These processes could make sure a high detection accurance. In this special situation, compared with original tracking path optimal algorithm, improved algorithm can get better perfoemance. The improved algorithm involved a simpler optimal function and simpler optimal stratiage. The simpler optimal stratiage drops out decrease, increase, add, delete optimal stratiage, only keep merge and splite to get a better performance. In summery, through analysis of the human target detection and tracking related algorithms, this thesis pointed out the drawbacks of the related algorithms. Based on the above-mentioned analysis, this thesis mainly studied:the training algorithm of Latent SVM, cascade Latent SVM detection algorithm, self-orgnized mapping background subtraction algorithm, mean-shift image tracking algorithm, RJMCMC multi-target tracking optimal algorithm. In addition, for each improved algorithms, the experimental study was conducted to validate its performance.

节点文献中: