èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽåŠç›‘ç£å¦ä¹ çš„ç‰©ä½“è¯†åˆ«

Object Classification Based on Semi-supervised Learning

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ è¤šé•‡é£žï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ ä¸Šæµ·äº¤é€šå¤§å¦ ï¼Œ ä¿¡å·ä¸Žä¿¡æ¯å¤„ç†ï¼Œ 2010ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ ç‰©ä½“è¯†åˆ«æ˜¯æœºå™¨å¦ä¹ ä¸çš„åŸºæœ¬é—®é¢˜,è§£å†³å¯¹æ–‡æœ¬ã€å›¾ç‰‡ã€è§†é¢‘ç‰æ•°æ®åšåˆ†ç±»è¯†åˆ«çš„é—®é¢˜ã€‚åœ¨æ•°æ®é‡è¾ƒå°‘çš„æƒ…å†µä¸‹,ä¼ ç»Ÿçš„æœºå™¨å¦ä¹ æ–¹æ³•å·²ç»å–å¾—äº†å¾ˆå¥½çš„æ•ˆæžœã€‚ä½†æ˜¯,éšç€ä¿¡æ¯é‡æŒ‡æ•°å¼çš„å¢žåŠ ,èŽ·å¾—å¤§é‡çš„æ•°æ®æ ‡æ³¨å·²ç»å˜å¾—å‡ ä¹Žæ— æ³•å®Œæˆ,è¿™ä½¿å¾—ä¼ ç»Ÿçš„æœºå™¨å¦ä¹ æ–¹æ³•åœ¨å¤„ç†è¿™ç±»é—®é¢˜çš„æ—¶å€™æ˜¾å¾—åŠ›ä¸ä»Žå¿ƒã€‚åœ¨è¿™æ ·çš„æƒ…å†µä¸‹,åŠç›‘ç£å¦ä¹ æ–¹æ³•åº”è¿è€Œç”Ÿ,å®ƒæ˜¯ä½¿ç”¨å°‘é‡æœ‰æ ‡æ³¨æ•°æ®çš„ä¿¡æ¯,å°†å…¶æ‰©å±•åˆ°æœªæ ‡æ³¨æ•°æ®ä¸Š,ä»Žè€Œå¯ä»¥è§£å†³ç¤ºä¾‹æ•°æ®å’Œæ ‡æ³¨æ•°æ®åœ¨æ•°é‡ä¸Šä¸¥é‡ä¸åŒ¹é…çš„é—®é¢˜ã€‚æœ¬æ–‡é˜è¿°äº†é’ˆå¯¹éš¾ä»¥èŽ·å¾—çš„ç²¾ç¡®æ ‡æ³¨å’Œå®¹æ˜“èŽ·å¾—çš„ç²—ç•¥æ ‡æ³¨åŒæ—¶å˜åœ¨çš„æƒ…å†µä¸‹çš„åŠç›‘ç£å¦ä¹ é—®é¢˜,ç ”ç©¶äº†ååŒè®ç»ƒçš„é²æ£’æ€§é—®é¢˜,å³å¯¹ç»™å®šåˆå§‹æ ‡æ³¨æ•°æ®ä¸çš„é”™è¯¯,å¯¹ååŒè®ç»ƒæ€§èƒ½çš„å½±å“ã€‚åœ¨ååŒè®ç»ƒçš„é²æ£’æ€§é—®é¢˜çš„åŸºç¡€ä¸Š,æœ¬æ–‡å°†ä¿¡æ¯ç“¶é¢ˆç®—æ³•å’Œè®¡ç®—åŽéªŒæ¦‚çŽ‡çš„æ–¹æ³•ç›¸ç»“åˆ,åˆ›æ–°æ€§åœ°æå‡ºäº†ä¸€ç§ä½¿ç”¨æ— ç›‘ç£å¦ä¹ æ–¹æ³•äº§ç”Ÿä¼ªæ ‡æ³¨çš„æ–¹æ³•ã€‚ä¸ŽçŽ°æœ‰æ–¹æ³•ç›¸æ¯”,è¯¥æ–¹æ³•ä»…éœ€è¦è¾ƒå°‘çš„æ ‡æ³¨ä¿¡æ¯,å¹¶å¯æœ‰æ•ˆé™ä½Žè®¡ç®—å¤æ‚åº¦ã€‚åœ¨ä½¿ç”¨ä¼ªæ ‡æ³¨çš„è¿‡ç¨‹ä¸,æœ¬æ–‡åˆ›æ–°æ€§åœ°æå‡ºäº†ä¸€ç§ä½¿ç”¨ä¼ªæ ‡æ³¨çš„ååŒè®ç»ƒæ–¹æ³•ã€‚è¯¥æ–¹æ³•ä»¥é‡æŽ’åºç®—æ³•ä¸ºä¸»è¦æ¡†æž¶,ä¸ŽçŽ°æœ‰æ–¹æ³•ç›¸æ¯”,æ¤æ–¹æ³•å¯¹åˆå§‹çš„é”™è¯¯æ ‡æ³¨,å…·æœ‰è¾ƒé«˜çš„é²æ£’æ€§ã€‚åœ¨åˆå§‹æ ‡æ³¨ä¸å˜åœ¨è¾ƒå¤šé”™è¯¯æ—¶,æ”¹è¿›åŽçš„æ–¹æ³•ä»ç„¶å¯ä»¥è®ç»ƒå‡ºæ€§èƒ½è¾ƒå¥½çš„åˆ†ç±»å™¨ã€‚æœ¬æ–‡åœ¨åˆ©ç”¨ä¼ªæ ‡æ³¨æ¥è¿›è¡ŒååŒè®ç»ƒæ—¶,ä»Žç»Ÿè®¡å¦è§’åº¦å¯¹è¯¥æ–¹æ³•è¿›è¡Œäº†ç†è®ºåˆ†æž,åœ¨æ•°å¦ä¸Šå¯¹è¯¥æ–¹æ³•åœ¨æé«˜ååŒè®ç»ƒçš„é²æ£’æ€§æ–¹é¢çš„æœ‰æ•ˆæ€§è¿›è¡Œäº†ç ”ç©¶,å¹¶æŽ¢è®¨äº†æœ´ç´ è´å¶æ–¯åˆ†ç±»å’Œä¿¡æ¯ç“¶é¢ˆæ–¹æ³•åœ¨ç†è®ºåŸºç¡€ä¸Šçš„ç›¸ä¼¼æ€§ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Object classification is one of the basic problems in machine learning, which aiming at solving the classification and recognition problem on text, image, and video data. In the case of small amount of data, traditional machine learning methods have already achieved a sound performace. However, as the exponential booming of information, it is impossible to obtain such a large amount of data with labels, which leads to ineffectivity of traditional methods. In such scenario, semi-supervised learning methods become a hot point in research. It uses small amount of data with labels and extends them to unlabeled data to fill the quantity gap of labeled examples and unlabeled examples.In this thesis, we focus on a typical semi-supervised learning problem with small amount of high-accurate labels and large amount of low-accurate labels. We also propose the robustness factor of co-training, which denotes the influence of initial incorrect labels to co-training process.Based on robustness problem of co-training, we originally propose an unsupervised pseudo-label-generating method based on the combination of information bottleneck principle and the method of posteri. In comparison with existing methods, this improvement needs smaller amount of labels and requires lower computation complexity.In applying pseudo-labels, we creatively discover a pseudo-label-aided co-training method. Comparing with existing methods, this method is more robust to initial incorrect labels. This improvement can guide co-training to obtain better classifiers even in the case that there are many incorrect labels in the labeled data.We also raise a theoretical analysis on this improvement in the angle of statics. We also mathematically prove the effectiveness in boosting robustness of co-training and discuss the similarity of Naive Bayes Classification and Information Bottleneck Principle.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ åŠç›‘ç£å¦ä¹ ï¼› ååŒè®ç»ƒï¼› ä¼ªæ ‡æ³¨ï¼› ä¿¡æ¯ç“¶é¢ˆç†è®ºï¼›
ã€Key wordsã€‘ Semi-supervised learningï¼› Co-trainingï¼› Pseudo-labelsï¼› Information bottleneck principleï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ ä¸Šæµ·äº¤é€šå¤§å¦

ã€åˆ†ç±»å·ã€‘TP181;TP391.4
ã€è¢«å¼•é¢‘æ¬¡ã€‘2
ã€ä¸‹è½½é¢‘æ¬¡ã€‘213
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽåŠç›‘ç£å­¦ä¹ çš„ç‰©ä½“è¯†åˆ«

Object Classification Based on Semi-supervised Learning

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

åŸºäºŽåŠç›‘ç£å¦ä¹ çš„ç‰©ä½“è¯†åˆ«