节点文献
面向人机交互的单目视频三维人体姿态估计研究
Research on Human Pose Estimation with Monocular Videos for HCI Applications
【作者】 李娜;
【导师】 陈纯;
【作者基本信息】 浙江大学 , 计算机科学与技术, 2008, 博士
【摘要】 自动理解图像或者视频序列中的运动人体,一直是计算机视觉研究的重点。除了人类对通过机器探索和仿造自身的兴趣外,促使其成为研究热点的一个重要原因是电子设备的迅猛发展和由其带来的巨大应用市场。本文针对人机交互应用,着重研究单目视频下三维人体姿态估计。单目视频三维人体姿态估计是计算机视觉研究中最具挑战性的问题之一。系统的观测输入为复杂自然图像,状态输出为高维人体姿态,由观测到状态的系统过程是动态且非线性的。此外,面向人机交互应用时,单目视频三维人体姿态估计系统的核心算法需同时满足准确、鲁棒和实时性要求,系统初始化过程应尽可能自动化。针对以上问题,本文依照模块分别展开研究,并将各部分算法集成至人机交互原型系统,从而实现基于单目视频三维人体姿态估计的人机交互。本文将单目视频三维人体姿态估计研究划分为三部分关键技术:图像特征提取、人体姿态估计算法以及初始化过程的自动化。其中,图像特征提取研究针对普通低端摄像设备,提出了基于HSV色彩空间的图像特征提取算法,通过采用与人眼视觉感知一致的HSV空间提高图像特征提取的有效性和鲁棒性。针对人体姿态估计算法,本文提出了判别模型和生成模型相结合的三维人体姿态估计数学模型。通过判别模型确定目标姿态的子空间,进而通过生成模型求解目标姿态,充分发挥了判别式模型和生成式模型各自的优势。针对系统初始化过程,本文重点介绍了手工分割视频对象的框架和评价标准,为用户辅助采集训练数据提供便利,减少用户在系统初始化过程中的交互工作量。根据以上核心算法设计,本文自行开发了基于肢体运动控制的新式人机交互实时系统。为验证系统的有效性,本文进一步开发了一款使用普通网络摄像头交互的简易游戏,为探讨基于人体运动的人机交互设计方法建立了实验平台。通过该平台,本文进行大量用户测试,并探讨这种新型人机交互在全新设计环境下面临的问题和机遇。测试结果表明了本文所提出的单目三维人体姿态估计系统的有效性,同时展示了此类基于人体运动的新型交互系统的独特魅力和广阔应用前景。
【Abstract】 Automatically analyzing and understanding human motion has been an important field of computer vision research for many years. The interests are inspired by not only human curiosity of exploring and imitating ourselves via computer but also the large potential market growing with the prevalence of personal computers and consume electronics. This thesis focuses on the problem of 3D human pose estimation with monocular camera for novel human computer interaction (CHI).Monocular 3D human pose estimation is one of the most challenging topics in computer vision. The difficulties lie in both the input and the output. The observation of the system is always complicated natural image, while the system state within a high-dimensional space. Inference from the observation to the state is essentially a nonlinear dynamic process. Moreover, a monocular 3D human pose estimation system has to be accurate, robust and real-time for CHI applications and the system initialization procedure should involve users as less as possible. With these requirements, we’have designed algorithms for all modules of a monocular 3D human pose estimation system and integrated them into a CHI prototype system; therefore, a CHI system based on monocular 3D human pose estimation is implemented.In this work, we define three key technologies for monocular 3D human pose estimation: image feature extraction, human pose estimation and automatically initialization. Our research on image feature extraction targets commonly-used low-end cameras, such as web-cameras. We adopt HSV color space, which is consistent with human visual system, to improve the effectiveness and robustness of image feature extraction. As far as the human pose estimation is concerned, we propose a hybrid model, combining discriminative model and generative model, to estimating 3D pose. The algorithm firstly locates a local subspace of human pose by a discriminative model, and then refines the pose within the local subspace by a generative model. In this way, the model takes on advantages of both models. As to automatic initialization, we focus on semi-automatic video object segmentation and evaluation metrics. An efficient tool for video object segmentation could help users provide training data easily and consequently reduce users’ manual work during initialization.Based on all the proposed algorithms, we develop a novel CHI system based human body movement. To further evaluate the CHI system, a web-camera based video game is implemented, which could be used for interaction design. Based on this game, we carry out a user study and discuss the problems and opportunities for the novel CHI system. The result of user study demonstrates the effectiveness of the proposed monocular 3D human pose estimation system, meanwhile shows us the attractiveness and brilliant future of the novel CHI system based on human movement