èŠ‚ç‚¹æ–‡çŒ®

äº’è”ç½‘å›¾åƒé«˜æ•ˆæ ‡æ³¨å’Œè§£è¯‘çš„å…³é”®æŠ€æœ¯ç ”ç©¶

Research of Large-Scale Web Image Annotation and Interpretation

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ å¤ä¸èƒ¤ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ æµ™æ±Ÿå¤§å¦ ï¼Œ è®¡ç®—æœºç§‘å¦ä¸ŽæŠ€æœ¯ï¼Œ 2010ï¼Œ åšå£«

ã€æ‘˜è¦ã€‘ ä½œä¸ºæ”¯æŒäº’è”ç½‘å¤§è§„æ¨¡å›¾åƒæ£€ç´¢çš„ä¸€ç§æœ‰æ•ˆå’Œå®žç”¨æ–¹æ³•,äº’è”ç½‘å›¾åƒè‡ªåŠ¨æ ‡æ³¨å’Œç†è§£å·²æˆä¸ºå¦æœ¯ç•Œå’Œäº§ä¸šç•Œçš„çƒç‚¹é—®é¢˜è€Œè¢«æ·±å…¥ç ”ç©¶ã€‚æœ¬æ–‡ç ”ç©¶äº†å›¾åƒè§†è§‰å†…å®¹ä¸Žä¼´éšæ–‡æœ¬è¯ä¹‰ä¹‹é—´çš„æ½œåœ¨å…³è”å…³ç³»æŒ–æŽ˜ã€å›¾åƒè§£è¯‘ã€å¤§è§„æ¨¡æ•°æ®èšç±»ä»¥åŠå›¾åƒè§†è§‰ç‰¹å¾æ·±åº¦å¦ä¹ ç‰å…³é”®æ€§é—®é¢˜ã€‚è®ºæ–‡çš„ä¸»è¦å·¥ä½œæœ‰ï¼šä¸€ã€æå‡ºäº†ä¸€ç§åŸºäºŽæ•°æ®é©±åŠ¨çš„äº’è”ç½‘å›¾åƒè‡ªåŠ¨æ ‡æ³¨å’Œç†è§£æ¡†æž¶(Automatic Web Image Annotation and Interpretation, AWIAI)ã€‚åœ¨å›¾åƒè‡ªåŠ¨æ ‡æ³¨è¿‡ç¨‹ä¸,AWIAIæ¡†æž¶å…ˆè®¡ç®—å›¾åƒä¼´éšæ–‡æœ¬ä¸å•è¯å¯è§åº¦å±žæ€§æ¥æž„å»ºâ€œå›¾åƒ-å•è¯â€å…³ç³»çŸ©é˜µ,ç„¶åŽå¯¹è¯¥å…³ç³»çŸ©é˜µè¿›è¡Œéšæ€§æ–‡æ³•åˆ†æžä»¥æ‰©å±•å¤‡é€‰æ ‡æ³¨å•è¯,æœ€åŽé€šè¿‡å›¾åƒè§†è§‰å†…å®¹çš„æ— ç›‘ç£å¦ä¹ å’Œå¯¹å•è¯ä¸¤ä¸¤å…±ç”Ÿå…³ç³»è¿›è¡Œåˆ†æžå’ŒæŽ’åº,å¾—åˆ°å›¾åƒæ ‡æ³¨æœ€ç»ˆç»“æžœã€‚äºŒã€åœ¨å›¾åƒè‡ªåŠ¨æ ‡æ³¨ç»“æžœçš„åŸºç¡€ä¸Š,æå‡ºäº†å›¾åƒè§£è¯‘çš„æ¦‚å¿µå’Œå…·ä½“å®žçŽ°æ–¹æ³•ã€‚çŽ°æœ‰å›¾åƒè‡ªåŠ¨æ ‡æ³¨æ–¹æ³•æœªèƒ½å¯¹æ ‡æ³¨å•è¯ä¹‹é—´å˜åœ¨çš„è¯æ³•å…³ç³»è¿›è¡Œåˆ†æž,å› æ¤å¾—åˆ°çš„å›¾åƒæ ‡æ³¨ç»“æžœæ˜¯è‹¥å¹²ç¦»æ•£å•è¯,éš¾ä»¥å¯¹å›¾åƒæ‰€è•´å«ä¸°å¯Œè¯ä¹‰è¿›è¡Œè‡ªç„¶è¯è¨€çš„æ·±å±‚æ¬¡æç»˜(å¦‚å¯¹å›¾åƒäº§ç”Ÿâ€œç†ŠçŒ«åƒç«¹åâ€çš„åˆ†æžç»“æžœ)ã€‚è¯¥æ–¹æ³•åœ¨AWIAIæ¡†æž¶ä¸‹å¾—åˆ°å›¾åƒæ ‡æ³¨å•è¯åŽ,åˆ†æžæ ‡æ³¨å•è¯ä¹‹é—´çš„è¯å¥å…³ç³»,äº§ç”Ÿå¥æ³•ç¾¤ç»„,ä»¥è‡ªç„¶è¯è¨€æ–¹å¼å¯¹ç›®æ ‡å›¾åƒå†…å®¹è¿›è¡Œè§£è¯‘ã€‚ä¸‰ã€å¯¹å˜åœ¨è‡´å¯†ç›¸ä¼¼åº¦å…³ç³»çš„å¤§è§„æ¨¡æ•°æ®,æœ¬æ–‡é’ˆå¯¹æ€§æå‡ºäº†ä¸¤ç§æ”¹è¿›çš„è¿‘é‚»ä¼ æ’èšç±»çš„æ–¹æ³•,å³åœ¨èšç±»è¿‡ç¨‹ä¸é€šè¿‡å±€éƒ¨ä¿¡æ¯ä¼ é€’æ¥åŠ å¿«æ•´ä½“ä¿¡æ¯ä¼ é€’é€Ÿåº¦çš„æ–¹æ³•,ä»¥åŠé€šè¿‡å¯¹å±€éƒ¨é‡‡æ ·æ•°æ®è¿›è¡Œä¿¡æ¯ä¼ é€’,å†å°†å…¶å®ƒæ•°æ®å†…åµŒè¿›åŽ»ä»Žè€Œå¾—åˆ°å¿«é€Ÿå…¨å±€è¿‘ä¼¼ç»“æžœçš„æ–¹æ³•ã€‚AWIAIæ¡†æž¶ä»¥æ•°æ®é©±åŠ¨ä¸ºæ ¸å¿ƒè¿›è¡Œå›¾åƒæ™ºèƒ½å¤„ç†,å› æ¤éœ€è¦è§£å†³å¤§è§„æ¨¡æ•°æ®é«˜æ•ˆèšç±»è¿™ä¸€éš¾ç‚¹é—®é¢˜ã€‚å››ã€åœ¨AWIAIçš„å›¾åƒç†è§£è¿‡ç¨‹ä¸,æœ¬æ–‡æå‡ºäº†ä¸€ç§ç»“åˆæ¨¡åž‹å’Œæ•°æ®é©±åŠ¨çš„æ·±åº¦å¦ä¹ æ–¹æ³•(Deep Model-based and Data-driven, DMD)æ¥æå–å›¾åƒç†è§£ä¸æœ€å…·åŒºåˆ«æ€§çš„è§†è§‰ç‰¹å¾ã€‚è¿‘æ¥ç¥žç»ç§‘å¦ç†è®ºç ”ç©¶æˆæžœè®¤ä¸ºå¤§è„‘å¯¹å¤–ç•Œè§†è§‰ä¿¡æ¯æ„ŸçŸ¥æ˜¯ä¸€ä¸ªé€å±‚å¦ä¹ è¿‡ç¨‹ã€‚DMDæ–¹æ³•é€šè¿‡ä¸€ä¸ªä»Žç®€å•åˆ°å¤æ‚çš„æ·±åº¦å¦ä¹ æµç¨‹æ¥æå–å›¾åƒè§†è§‰ç‰¹å¾,å…ˆä»¥æ— ç›‘ç£å¦ä¹ æ–¹æ³•èŽ·å¾—ç‰¹å¾å¹¶å°†å…¶ç¨€ç–åŒ–,ç„¶åŽé€šè¿‡æœ‰ç›‘ç£å¦ä¹ æ–¹æ³•å®žçŽ°å›¾åƒè¯ä¹‰ç†è§£å’Œæ ‡æ³¨ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ As one of practical and effective ways for large-scale web image retrieval, automatic web image annotation and understanding have been hot topics both in academic and industrial research areas. This dissertation mainly focuses on research issues such as mining of relevance relationship between visual features and surrounding text, image interpretation, large-scale data clustering and deep learning of image features.In order to resolve above mentioned issues, this dissertation proposes a data-driven automatic web image annotation and understanding framework (Automatic Web Image Annotation and Interpretation, AWIAI). For the sake of annotating images with suitable words, AWIAI first calculates the visibility of words in surrounding text to build the "image-word" matrix, then extends the initial annotation result by latent visual and semantic analysis, and the final annotated words are obtained by unsupervised learning of visual correlation and co-occurrence of annotation words.The current approaches of image annotation only utilizes several discrete words to describe the image semantics since those approaches neglect the statement-level syntactic correlation among the annotated words. As a result, those approaches are inability to render natural language interpretation for images such as "pandas eat bamboo". To solve this problem, "Image Interpretation" is proposed in this dissertation. The basic idea of image interpretation is to discover the statement-level syntactic correlation among annotated words, and produce interpretation results by natural language.AWIAI framework is a data-driven pipeline for image processing, which often encounters the problem of large-scale data clustering. This dissertation presents two kinds of clustering approaches for large-scale data with a dense similarity matrix. Partition Affinity Propagation (PAP) passes messages in the subsets of data first and then merges all of data together. PAP can effectively reduce the number of iterations of clustering. Landmark Affinity Propagation (LAP) passes messages between the landmark data first and then clusters other data. LAP is a large global approximation method to speed up clustering.Recent advancements in neuroscience have indicated that our human being brain perceives the outside world with a hierarchical learning process. Motivated by such research, a model-based and data-driven hybrid architecture (DMD) is proposed in AWIAI to boost image annotations by learning out discriminant features. DMD first selects a deep learning pipeline to progressively learn visual features from simple to complex. Then DMD integrates deep model-based learning and data-driven learning pipelines together. After the discriminant image representations are obtained by a sparse regularization from both pipelines in an unsupervised way, a supervised learning algorithm is conducted to predict image objects in images.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ å›¾åƒæ ‡æ³¨ï¼› å›¾åƒè§£è¯‘ï¼› å•è¯å¯è§åº¦ï¼› è¿‘é‚»ä¼ æ’èšç±»ï¼› æ·±åº¦å¦ä¹ ï¼› æ•°æ®é©±åŠ¨ï¼›
ã€Key wordsã€‘ Automatic Image Annotationï¼› Image Interpretationï¼› Word Visibilityï¼› Data Clusteringï¼› Deep Learningï¼› Data-Drivenï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ æµ™æ±Ÿå¤§å¦

ã€åˆ†ç±»å·ã€‘TP391.41
ã€è¢«å¼•é¢‘æ¬¡ã€‘3
ã€ä¸‹è½½é¢‘æ¬¡ã€‘977
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

äº’è”ç½‘å›¾åƒé«˜æ•ˆæ ‡æ³¨å’Œè§£è¯‘çš„å…³é”®æŠ€æœ¯ç ”ç©¶

Research of Large-Scale Web Image Annotation and Interpretation

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

äº’è”ç½‘å›¾åƒé«˜æ•ˆæ ‡æ³¨å’Œè§£è¯‘çš„å…³é”®æŠ€æœ¯ç ”ç©¶