èŠ‚ç‚¹æ–‡çŒ®

é¢å‘äººæœºäº¤äº’çš„å•ç›®è§†é¢‘ä¸‰ç»´äººä½“å§¿æ€ä¼°è®¡ç ”ç©¶

Research on Human Pose Estimation with Monocular Videos for HCI Applications

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ æŽå¨œï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ æµ™æ±Ÿå¤§å¦ ï¼Œ è®¡ç®—æœºç§‘å¦ä¸ŽæŠ€æœ¯ï¼Œ 2008ï¼Œ åšå£«

ã€æ‘˜è¦ã€‘ è‡ªåŠ¨ç†è§£å›¾åƒæˆ–è€…è§†é¢‘åºåˆ—ä¸çš„è¿åŠ¨äººä½“,ä¸€ç›´æ˜¯è®¡ç®—æœºè§†è§‰ç ”ç©¶çš„é‡ç‚¹ã€‚é™¤äº†äººç±»å¯¹é€šè¿‡æœºå™¨æŽ¢ç´¢å’Œä»¿é€ è‡ªèº«çš„å…´è¶£å¤–,ä¿ƒä½¿å…¶æˆä¸ºç ”ç©¶çƒç‚¹çš„ä¸€ä¸ªé‡è¦åŽŸå› æ˜¯ç”µåè®¾å¤‡çš„è¿…çŒ›å‘å±•å’Œç”±å…¶å¸¦æ¥çš„å·¨å¤§åº”ç”¨å¸‚åœºã€‚æœ¬æ–‡é’ˆå¯¹äººæœºäº¤äº’åº”ç”¨,ç€é‡ç ”ç©¶å•ç›®è§†é¢‘ä¸‹ä¸‰ç»´äººä½“å§¿æ€ä¼°è®¡ã€‚å•ç›®è§†é¢‘ä¸‰ç»´äººä½“å§¿æ€ä¼°è®¡æ˜¯è®¡ç®—æœºè§†è§‰ç ”ç©¶ä¸æœ€å…·æŒ‘æˆ˜æ€§çš„é—®é¢˜ä¹‹ä¸€ã€‚ç³»ç»Ÿçš„è§‚æµ‹è¾“å…¥ä¸ºå¤æ‚è‡ªç„¶å›¾åƒ,çŠ¶æ€è¾“å‡ºä¸ºé«˜ç»´äººä½“å§¿æ€,ç”±è§‚æµ‹åˆ°çŠ¶æ€çš„ç³»ç»Ÿè¿‡ç¨‹æ˜¯åŠ¨æ€ä¸”éžçº¿æ€§çš„ã€‚æ¤å¤–,é¢å‘äººæœºäº¤äº’åº”ç”¨æ—¶,å•ç›®è§†é¢‘ä¸‰ç»´äººä½“å§¿æ€ä¼°è®¡ç³»ç»Ÿçš„æ ¸å¿ƒç®—æ³•éœ€åŒæ—¶æ»¡è¶³å‡†ç¡®ã€é²æ£’å’Œå®žæ—¶æ€§è¦æ±‚,ç³»ç»Ÿåˆå§‹åŒ–è¿‡ç¨‹åº”å°½å¯èƒ½è‡ªåŠ¨åŒ–ã€‚é’ˆå¯¹ä»¥ä¸Šé—®é¢˜,æœ¬æ–‡ä¾ç…§æ¨¡å—åˆ†åˆ«å±•å¼€ç ”ç©¶,å¹¶å°†å„éƒ¨åˆ†ç®—æ³•é›†æˆè‡³äººæœºäº¤äº’åŽŸåž‹ç³»ç»Ÿ,ä»Žè€Œå®žçŽ°åŸºäºŽå•ç›®è§†é¢‘ä¸‰ç»´äººä½“å§¿æ€ä¼°è®¡çš„äººæœºäº¤äº’ã€‚æœ¬æ–‡å°†å•ç›®è§†é¢‘ä¸‰ç»´äººä½“å§¿æ€ä¼°è®¡ç ”ç©¶åˆ’åˆ†ä¸ºä¸‰éƒ¨åˆ†å…³é”®æŠ€æœ¯:å›¾åƒç‰¹å¾æå–ã€äººä½“å§¿æ€ä¼°è®¡ç®—æ³•ä»¥åŠåˆå§‹åŒ–è¿‡ç¨‹çš„è‡ªåŠ¨åŒ–ã€‚å…¶ä¸,å›¾åƒç‰¹å¾æå–ç ”ç©¶é’ˆå¯¹æ™®é€šä½Žç«¯æ‘„åƒè®¾å¤‡,æå‡ºäº†åŸºäºŽHSVè‰²å½©ç©ºé—´çš„å›¾åƒç‰¹å¾æå–ç®—æ³•,é€šè¿‡é‡‡ç”¨ä¸Žäººçœ¼è§†è§‰æ„ŸçŸ¥ä¸€è‡´çš„HSVç©ºé—´æé«˜å›¾åƒç‰¹å¾æå–çš„æœ‰æ•ˆæ€§å’Œé²æ£’æ€§ã€‚é’ˆå¯¹äººä½“å§¿æ€ä¼°è®¡ç®—æ³•,æœ¬æ–‡æå‡ºäº†åˆ¤åˆ«æ¨¡åž‹å’Œç”Ÿæˆæ¨¡åž‹ç›¸ç»“åˆçš„ä¸‰ç»´äººä½“å§¿æ€ä¼°è®¡æ•°å¦æ¨¡åž‹ã€‚é€šè¿‡åˆ¤åˆ«æ¨¡åž‹ç¡®å®šç›®æ ‡å§¿æ€çš„åç©ºé—´,è¿›è€Œé€šè¿‡ç”Ÿæˆæ¨¡åž‹æ±‚è§£ç›®æ ‡å§¿æ€,å……åˆ†å‘æŒ¥äº†åˆ¤åˆ«å¼æ¨¡åž‹å’Œç”Ÿæˆå¼æ¨¡åž‹å„è‡ªçš„ä¼˜åŠ¿ã€‚é’ˆå¯¹ç³»ç»Ÿåˆå§‹åŒ–è¿‡ç¨‹,æœ¬æ–‡é‡ç‚¹ä»‹ç»äº†æ‰‹å·¥åˆ†å‰²è§†é¢‘å¯¹è±¡çš„æ¡†æž¶å’Œè¯„ä»·æ ‡å‡†,ä¸ºç”¨æˆ·è¾…åŠ©é‡‡é›†è®ç»ƒæ•°æ®æä¾›ä¾¿åˆ©,å‡å°‘ç”¨æˆ·åœ¨ç³»ç»Ÿåˆå§‹åŒ–è¿‡ç¨‹ä¸çš„äº¤äº’å·¥ä½œé‡ã€‚æ ¹æ®ä»¥ä¸Šæ ¸å¿ƒç®—æ³•è®¾è®¡,æœ¬æ–‡è‡ªè¡Œå¼€å‘äº†åŸºäºŽè‚¢ä½“è¿åŠ¨æŽ§åˆ¶çš„æ–°å¼äººæœºäº¤äº’å®žæ—¶ç³»ç»Ÿã€‚ä¸ºéªŒè¯ç³»ç»Ÿçš„æœ‰æ•ˆæ€§,æœ¬æ–‡è¿›ä¸€æ¥å¼€å‘äº†ä¸€æ¬¾ä½¿ç”¨æ™®é€šç½‘ç»œæ‘„åƒå¤´äº¤äº’çš„ç®€æ˜“æ¸¸æˆ,ä¸ºæŽ¢è®¨åŸºäºŽäººä½“è¿åŠ¨çš„äººæœºäº¤äº’è®¾è®¡æ–¹æ³•å»ºç«‹äº†å®žéªŒå¹³å°ã€‚é€šè¿‡è¯¥å¹³å°,æœ¬æ–‡è¿›è¡Œå¤§é‡ç”¨æˆ·æµ‹è¯•,å¹¶æŽ¢è®¨è¿™ç§æ–°åž‹äººæœºäº¤äº’åœ¨å…¨æ–°è®¾è®¡çŽ¯å¢ƒä¸‹é¢ä¸´çš„é—®é¢˜å’Œæœºé‡ã€‚æµ‹è¯•ç»“æžœè¡¨æ˜Žäº†æœ¬æ–‡æ‰€æå‡ºçš„å•ç›®ä¸‰ç»´äººä½“å§¿æ€ä¼°è®¡ç³»ç»Ÿçš„æœ‰æ•ˆæ€§,åŒæ—¶å±•ç¤ºäº†æ¤ç±»åŸºäºŽäººä½“è¿åŠ¨çš„æ–°åž‹äº¤äº’ç³»ç»Ÿçš„ç‹¬ç‰¹é…åŠ›å’Œå¹¿é˜”åº”ç”¨å‰æ™¯ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Automatically analyzing and understanding human motion has been an important field of computer vision research for many years. The interests are inspired by not only human curiosity of exploring and imitating ourselves via computer but also the large potential market growing with the prevalence of personal computers and consume electronics. This thesis focuses on the problem of 3D human pose estimation with monocular camera for novel human computer interaction (CHI).Monocular 3D human pose estimation is one of the most challenging topics in computer vision. The difficulties lie in both the input and the output. The observation of the system is always complicated natural image, while the system state within a high-dimensional space. Inference from the observation to the state is essentially a nonlinear dynamic process. Moreover, a monocular 3D human pose estimation system has to be accurate, robust and real-time for CHI applications and the system initialization procedure should involve users as less as possible. With these requirements, weâ€™have designed algorithms for all modules of a monocular 3D human pose estimation system and integrated them into a CHI prototype system; therefore, a CHI system based on monocular 3D human pose estimation is implemented.In this work, we define three key technologies for monocular 3D human pose estimation: image feature extraction, human pose estimation and automatically initialization. Our research on image feature extraction targets commonly-used low-end cameras, such as web-cameras. We adopt HSV color space, which is consistent with human visual system, to improve the effectiveness and robustness of image feature extraction. As far as the human pose estimation is concerned, we propose a hybrid model, combining discriminative model and generative model, to estimating 3D pose. The algorithm firstly locates a local subspace of human pose by a discriminative model, and then refines the pose within the local subspace by a generative model. In this way, the model takes on advantages of both models. As to automatic initialization, we focus on semi-automatic video object segmentation and evaluation metrics. An efficient tool for video object segmentation could help users provide training data easily and consequently reduce usersâ€™ manual work during initialization.Based on all the proposed algorithms, we develop a novel CHI system based human body movement. To further evaluate the CHI system, a web-camera based video game is implemented, which could be used for interaction design. Based on this game, we carry out a user study and discuss the problems and opportunities for the novel CHI system. The result of user study demonstrates the effectiveness of the proposed monocular 3D human pose estimation system, meanwhile shows us the attractiveness and brilliant future of the novel CHI system based on human movementæ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ ä¸‰ç»´äººä½“ï¼› å§¿æ€ä¼°è®¡ï¼› äººæœºäº¤äº’ï¼› äººä½“å§¿æ€ï¼› ç®—æ³•è®¾è®¡ï¼› è®¡ç®—æœºè§†è§‰ï¼› å›¾åƒç‰¹å¾æå–ï¼› å›¾åƒè¾¹ç¼˜ï¼› è‰²å½©ç©ºé—´ï¼› æ¨¡åž‹ä¼˜åŒ–ï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ æµ™æ±Ÿå¤§å¦

ã€åˆ†ç±»å·ã€‘TP391.41
ã€è¢«å¼•é¢‘æ¬¡ã€‘7
ã€ä¸‹è½½é¢‘æ¬¡ã€‘1159
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

é¢å‘äººæœºäº¤äº’çš„å•ç›®è§†é¢‘ä¸‰ç»´äººä½“å§¿æ€ä¼°è®¡ç ”ç©¶

Research on Human Pose Estimation with Monocular Videos for HCI Applications

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

é¢å‘äººæœºäº¤äº’çš„å•ç›®è§†é¢‘ä¸‰ç»´äººä½“å§¿æ€ä¼°è®¡ç ”ç©¶