节点文献

单目视频中人体运动建模及姿态估计研究

Research on Human Motion Modeling and Pose Estimation from Monocular Videos

【作者】 欧阳毅

【导师】 张三元;

【作者基本信息】 浙江大学 , 计算机科学与技术, 2012, 博士

【摘要】 对于单目视频中人体运动的自动理解和姿态估计一直是计算机视觉研究的热点问题。本文从五个方面对基于单目视频的人体检测及运动分析展开研究,分析了基于视频的人体检测技术:三维人体运动捕获技术,行人检测技术,视频人体运动特征提取的方法,人体运动跟踪技术和人体运动姿态估计技术。在此基础上进行人体运动建模和基于单目图像的姿态估计。首先采用基于窗口梯度势能的遮挡人体检测方法对视频中人体信息进行检测,提出了一种基于窗口梯度势能(Window Gradient Potential Energy,WGPE)的人体检测方法。在特征窗口扫描过程中,通过加权级联SVM,实现对半遮挡情况下的人体检测,利用稀疏-稠密窗口势能集筛选缩短了检测时间。由于WGPE利用了HOG特征计算过程中的梯度信息,因此本算法与其他的基于HOG的快速检测算法来,并不需要增加过多的计算开销,在背景较为平滑的图像中,与传统的HOG检测方法相比具有较少的检测时间,对于较复杂的背景,本算法与传统的HOG检测算法相当。实验表明在人体检测的准确率和效率方面有所提高,对于处于半遮挡情况下人体检测,准确率也有明显提高。对图像中人体姿态估计方面,采用基于贝叶斯模型的人体运动姿态估计方法,对静态图像中人体进行肢体进行分析。提出基于边缘轮廓特征的贝叶斯模型,为了进一步提高肢体分析的准确率引入了基于骨架轨迹图对姿态进行分析。对于视频图像中人体的姿态分析采用基于条件随机场模型的静态图像姿态估计,首先对图像中人体运动姿态的SIFT特征进行提取,建立SIFT人体运动特征库对人体运动姿态进行估计,采用基于条件随机场的肢体可变结构对人体进行建模,并采用条件随机场模型对对人体姿态进行估计,为进一步提高姿态估计的准确率和满足实时性的要求,先对人体运动数据进行运动节奏特征数据的提取,提出基于EM-GM人体运动节奏特征数据的自动提取算法;对视频图像中的人体运动采用动态构建颜色-边缘特征人体模型的方法进行建模,其中各肢体的边缘信息匹配采用快速定向导角(Fast Directional Chamfer Matching FDCM)方法,并提出了快速人体肢体检测算法。然后采用基于节奏运动信息进行人体三维姿态估计。对检测结果融入运动节奏信息进行三维人体姿态估计,在参数的推理过程中,首先采用GPLVM方法对人体运动数据进行降维处理,再采用局部动态特征建模,最后进行三维人体姿态参数估计。对于视频图像中人体姿态估计,本文提出了基于约束图的视频人体姿态估计方法,首先建立层次组合的人体运动模型,定义了人体肢体模型。并提出了基于相关动作簇的运动模型,为了缩减搜索空间,提出RPC节点图生成树算法,并细化了RPC的节点合并,节点分裂和生成树平衡算法。根据RPC节点图生成树算法,提出了视频人体姿态估计算法,和基于RPC生成树模型的推理算法。提出了一种基于三维人体动作库投影图数据驱动的(Markov chain Monte Carlo MCMC)方法对单目视频图像中的人体姿态进行跟踪,首先对运动捕捉设备获取的人体基本运动库中人体外观在不同视角下的外观投影图进行聚类;采用HOG对单目视频图像中人体进行检测,能较准确分割出人体各肢体位置;最后通过三维人体姿态推理算法外观模型对每帧进行分析模型,再利用时间约束的分析模型对目标进行跟踪。采用约束图驱动的MCMC和基本动作库相结合构建一个适用于视频数据建模的,并将该模型应用于数据驱动的联机行为识别,提高人体姿态的建模能力

【Abstract】 Automatically analyzing and understanding human motion and pose estimation has been an important field of computer vision for many years. This thesis focuses on five aspects for video-based human motion analyzing:3D human motion capture technology, pedestrian detection, human motion feature extraction, human motion tracking and 3D pose estimation technology. Through these methods, the human motion model is built, and to estimate human pose from monocular imagesIn order to improve accuracy of the human detection under occlusion, this paper proposes the conception of the Window edge of the Gradient of Potential Energy (WGPE) and a fast human detection method based on gradient potential energy. By using sparse-dense gradient potential windows set, the detection time of the multi-scale detection can be shortening. Cascading SVM training using weighted positive and negative samples, the occlusion sample of the human body is weighted to detect the human body under occlusion. Filter positive in the detection window, the algorithm does not require too much computational overhead increases when the detection window is filtered. In the smooth background image, the proposed method compared to the Multi-level HOG detection and Histograms of Oriented Gradients and Local Binary Pattern (HOG-LBP) methods accuracy at the same rate, spends less detection time. Experiments show that the human detection accuracy and efficiency has increased, the case for the human body in partial occlusion detection, the accuracy rate is improved markedly.To human pose estimation, the Bayesian model based on the edge contour is used to estimate human motion. We proposed a novel Bayesian method, and introduce trajectories of bones in order to improve the accuracy of the analysis.For the video analysis in the human pose estimation, another method based on Conditional Random Field (CRF) model is proposed. The human body silhouette image SIFT feature is extracted, and using SIFT feature database to estimate the pose of the human motion, and using CRF to estimate human posture.To improve pose estimation accuracy and meet the requirements of real-time, the human motion rhythmic data is automatic extracted by the proposed method EM-GM algorithm. We build dynamic color-edge features to model the human body, in which the edge information matching using Fast Directional Chamfer Matching (FDCM) method. The rhythm-based 3D motion information is used to estimate the human pose. By rhythmic movement data, the 3D human posture is estimated. Using GPLVM method to reduce the human motion data dimension and then using a local modeling of the dynamic, the 3D human body pose can be estimated.For the video image in the human body pose estimation, this thesis presents a constraint graph based video body posture estimation method, first to establish levels for human movement model, defines the human body model. And put forward relevant actions based on the movement of the cluster model, in order to reduce the search space, spanning tree algorithm proposed RPC node graph, and refinement of the merger of RPC nodes, node splitting and spanning tree balancing algorithm. According to RPC node graph spanning tree algorithm, proposed human body posture estimation algorithm for video, and RPC-based spanning tree model inference algorithm. We proposed a Markov chain Monte Carlo method based on 3D human motion silhouette projection library for monocular video images of the human body gesture tracking, motion capture equipment to get the basic movement of the body’s appearance in a different library. Perspective projection of the human silhouette of clustering; using HOG monocular video images of the human body to detect the human body can be segmented more accurately the location of the body; the final adoption of the 3D silhouette model of the human body posture inference algorithm to analyze the model for each frame, re-use time constraints of the model to track the target. Constraint graph-driven MCMC using basic movements and combined to build a database for video data modeling and data-driven model is applied to the online behavior recognition; improve the body posture of the modeling capabilities.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2012年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络