èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽæ ¸çš„å¦ä¹ ç®—æ³•ä¸Žåº”ç”¨

Kernel Based Learning Algorithm and Application

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ æ¸ä»¤ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ å¤§è¿žç†å·¥å¤§å¦ ï¼Œ è¿ç¹å¦ä¸ŽæŽ§åˆ¶è®ºï¼Œ 2012ï¼Œ åšå£«

ã€æ‘˜è¦ã€‘ æ ¸æŠ€å·§æ˜¯è§£å†³éžçº¿æ€§é—®é¢˜çš„å¼ºåŠ›å·¥å…·,åŸºäºŽæ ¸çš„å¦ä¹ ç†è®ºä¸Žç®—æ³•ç ”ç©¶æ˜¯æœºå™¨å¦ä¹ é¢†åŸŸçš„ç ”ç©¶çƒç‚¹.æœ¬æ–‡ä¸»è¦é’ˆå¯¹æ ¸å¦ä¹ ç®—æ³•è®¾è®¡åŠå…¶åœ¨é«˜ç‚‰å†¶ç‚¼è¿‡ç¨‹ã€è›‹ç™½è´¨é‰´å®šé—®é¢˜ä¸çš„è‹¥å¹²åº”ç”¨å±•å¼€ç ”ç©¶.æ ¸å¦ä¹ ç®—æ³•è®¾è®¡æ–¹é¢,è®¾è®¡äº†äºŒè¿›åˆ¶ç¼–ç æ”¯æŒå‘é‡æœº(Support Vector Ma-chines:SVM)ç®—æ³•,å°†N-åˆ†ç±»é—®é¢˜è½¬åŒ–ä¸º[log2N]ä¸ªäºŒåˆ†ç±»åé—®é¢˜,ç›¸æ¯”äºŽä¼ ç»Ÿçš„one-against-oneæ–¹æ³•éœ€è¦(?)(N2)ä¸ªååˆ†ç±»å™¨,one-against-allæ–¹æ³•éœ€è¦(?)(N)ä¸ªååˆ†ç±»å™¨,äºŒè¿›åˆ¶ç¼–ç SVMæ˜¾è‘—æé«˜äº†ååˆ†ç±»å™¨çš„æ•ˆçŽ‡ï¼›å°†æœ€å°äºŒä¹˜æ”¯æŒå‘é‡æœº(Least Square SVM:LS-SVM)çš„å¤šæ ¸å¦ä¹ (Multiple Kernel Learning:MKL)(?)é—®é¢˜è½¬åŒ–ä¸ºåŠå®šè§„åˆ’é—®é¢˜(Semidefinite Programming:SDP),åœ¨MKLç»Ÿä¸€æ¡†æž¶ä¸‹å®žçŽ°äº†å¯¹æ ¸ç³»æ•°å’Œæ£åˆ™åŒ–å‚æ•°çš„ä¼˜åŒ–,è¿›è€ŒæŽ¨åŠ¨äº†æ ¸å’Œæ£åˆ™åŒ–å‚æ•°çš„è‡ªåŠ¨åŒ–é€‰å–,ä¸ŽSVM MKLç›¸æ¯”LS-SVM MKLåœ¨ä¿æŒç²¾åº¦çš„åŒæ—¶è®¡ç®—å¤æ‚åº¦å¤§å¤§é™ä½Ž,UCIåŸºå‡†æ•°æ®åº“ä¸Šçš„æ•°å€¼è¯•éªŒéªŒè¯äº†æ‰€è®¾è®¡LS-SVM MKLç®—æ³•çš„æœ‰æ•ˆæ€§.é«˜ç‚‰å†¶ç‚¼è¿‡ç¨‹çš„ç‚‰æ¸©é¢„æµ‹ä¸Žè¶‹åŠ¿åˆ†ç±»æ˜¯æœ¬æ–‡ç ”ç©¶çš„åº”ç”¨é—®é¢˜ä¹‹ä¸€.æœ¬æ–‡ä»¥é«˜ç‚‰ç‚‰å†…çƒçŠ¶æ€çš„é‡è¦æŒ‡æ ‡é«˜ç‚‰é“æ°´ç¡…å«é‡([Si])ä¸ºç ”ç©¶å¯¹è±¡,åœ¨å…‰æ»‘æ”¯æŒå‘é‡å›žå½’æœº(Smooth Support Vector Regression:SSVR)æ¨¡åž‹ä¸å¼•å…¥æ»‘åŠ¨çª—å£(Sliding Windows:SW)æœºåˆ¶å»ºç«‹äº†SW-SSVRæ¨¡åž‹,é€šè¿‡ä¸æ–æ›´æ–°å¦ä¹ æ ·æœ¬,èƒ½å¤ŸåŠæ—¶è¿½è¸ªç³»ç»Ÿçš„å˜åŒ–,åº”ç”¨SW-SSVRæ¨¡åž‹å¯¹[Si]è¿›è¡Œæ•°å€¼é¢„æŠ¥,æ•°å€¼è¯•éªŒè¡¨æ˜Ž,SW-SSVRæ¨¡åž‹æœ‰è¾ƒé«˜çš„é¢„æµ‹æˆåŠŸçŽ‡,è¾ƒçŸçš„è®¡ç®—æ—¶é—´,é€‚åˆåœ¨çº¿åº”ç”¨ï¼›å°†[Si]è¶‹åŠ¿é¢„æŠ¥é—®é¢˜è½¬åŒ–ä¸ºä¸€ä¸ª4åˆ†ç±»é—®é¢˜,å³å‰§å‡ã€å¾®å‡ã€å¾®é™ã€å‰§é™,åº”ç”¨äºŒè¿›åˆ¶ç¼–ç SVMå¯¹å›½å†…ä¸¤åº§é«˜ç‚‰[Si]è¿›è¡Œè¶‹åŠ¿é¢„æŠ¥,è¯¥æ¨¡åž‹ä½¿å¾—é«˜ç‚‰å·¥é•¿åœ¨æŽ§åˆ¶é«˜ç‚‰ç‚‰æ¸©æ–¹å‘çš„åŒæ—¶å¯ä»¥å†³å®šè°ƒæŽ§åŠ›åº¦ï¼›ä½¿ç”¨MKLæ•´åˆé«˜ç‚‰å†¶ç‚¼è¿‡ç¨‹ä¸å‡ºçŽ°çš„å¼‚è´¨æ•°æ®æé«˜äº†æ¨¡åž‹é¢„æµ‹ç²¾åº¦,åº”ç”¨MKLå¯¹é«˜ç‚‰é‡‡é›†å˜é‡è¿›è¡Œç‰¹å¾çº¦ç®€,å¢žå¼ºäº†é»‘ç®±æ¨¡åž‹çš„å¯è§£é‡Šæ€§.åŸºäºŽä¸²è”è´¨è°±(MS/MS)çš„å¤šè‚½é‰´å®šé—®é¢˜æ˜¯æœ¬æ–‡ç ”ç©¶çš„å¦ä¸€ä¸ªåº”ç”¨é—®é¢˜.è›‹ç™½è´¨ç»„å¦æ˜¯åŽåŸºå› ç»„æ—¶ä»£çš„å‰æ²¿çƒç‚¹,è€Œä¸²è”è´¨è°±ã€è›‹ç™½è´¨èŠ¯ç‰‡ç‰é«˜é€šé‡å®žéªŒæŠ€æœ¯æžå¤§åœ°æŽ¨åŠ¨äº†è›‹ç™½è´¨ç»„å¦çš„å‘å±•.é€šè¿‡ä¸²è”è´¨è°±é‰´å®šå¤šè‚½åºåˆ—è¿›è€Œé‰´å®šè›‹ç™½è´¨æ˜¯å½“å‰è›‹ç™½è´¨ç»„å¦ç ”ç©¶ä¸å¸¸ç”¨çš„ç ”ç©¶æ–¹æ³•.ç”±äºŽè›‹ç™½è´¨æ ·å“å’Œç”Ÿç‰©å®žéªŒçš„å¤æ‚æ€§,è´¨è°±å›¾å¯Œå«å™ªå£°,æ•°æ®åº“æœç´¢å¾—åˆ°çš„å¤šè‚½åŒ¹é…ä¸å˜åœ¨å¤§é‡é˜´æ€§é‰´å®š,ç›®å‰å·²æå‡ºå¤šç§ç®—æ³•ç”¨æ¥ä¼˜åŒ–å¤šè‚½é‰´å®š,ä½†ä»ä¸èƒ½å®Œç¾Žåœ°åŒºåˆ†é˜³æ€§å’Œé˜´æ€§å¤šè‚½é‰´å®š.é‰´äºŽæ¤,æœ¬æ–‡åº”ç”¨åŸºäºŽMKL SVMçš„De-Noiseç®—æ³•å°†ä¸²è”è´¨è°±æ•°æ®å¤šè‚½é‰´å®šé—®é¢˜è½¬åŒ–ä¸ºç‰¹æ®Šåˆ†ç±»é—®é¢˜ï¼šæ£ç±»æ ·æœ¬ç‚¹è¢«ä¸¥é‡æ±¡æŸ“å¹¶ä¸å¯ä¿¡,è€Œè´Ÿç±»æ ·æœ¬ç‚¹å®Œå…¨å¯ä¿¡De-Noiseç®—æ³•é¦–å…ˆä¾èµ–è·ç¦»å…³ç³»æ‰§è¡ŒåŽ»å™ªå¤„ç†,ç„¶åŽåŸºäºŽåŽ»å™ªåŽçš„æ ·æœ¬é›†è®ç»ƒSVMåˆ†ç±»å™¨å¹¶æ‰§è¡Œ2æ¬¡ç²¾ç‚¼è¿‡ç¨‹,æœ€åŽæ•´åˆå¤šè‚½çš„é…¶åˆ‡ä¿¡æ¯ç»™å‡ºé‰´å®šç»“æžœ.åœ¨3ä¸ªè›‹ç™½è´¨æ•°æ®é›†Yeast(LCQè´¨è°±ä»ª)ã€UPS1(LTQè´¨è°±ä»ª)ã€Ta108(Orbitè´¨è°±ä»ª)çš„SEQUESTæœåº“ç»“æžœä¸å¯¹æ¯”äº†De-Noiseç®—æ³•å’ŒPeptideProphetã€Percolatorçš„å¤šè‚½é‰´å®šç»“æžœ,åœ¨ç»™å®šæœŸæœ›å‡é˜³æ€§çŽ‡(False Discovery Rate:FDR)ä¸‹De-Noiseç®—æ³•æ˜¾è‘—æé«˜äº†å¤šè‚½é‰´å®šçš„çµæ•åº¦å’Œç‰¹å¼‚æ€§.æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Kernel trick is a powerful tool for solving nonlinear problems, kernel based learning theory and algorithm are research focuses in machine learning field. This thesis mainly focuses on the design of kernel based learning algorithm and its application in blast furnace ironmaking process and protein identification problems.The main studies in design of kernel based learning algorithms lie in:propose a novel binary coding SVM algorithm which takes a N-classes classification task as multiple binary classification problem and only requires [log2N]binary classifiers, greatly lower than the con-ventional one-against-one method (?)(N2) and one-against-all method (?)(N); formulate the is-sue of multiple kernel learning(MKL) for LS-SVM as a semidefinite programming to get the global optimal solution, furthermore, optimize the regularization parameter with the kernel co-efficients in a unified framework, which leads to an automatic process for model selection, the computational complexity of LS-SVM MKL reduces greatly compared with that of SVM MKL but sharing evenly matched precision, which makes LS-SVM MKL be suitable for dealing with large scale data sets, and perform extensive validation experiments.As one application problem, this paper studies the prediction and trend classification mod-els of temperature in blast furnace(BF) ironmaking progress. Focus on the silicon content in hot metal([Si]), a chief indicator of the furnace temperature, this thesis explores the nonlinear approximation ability of SVM and constructs data-based models for [Si] prediction includes: incorporate the sliding windows schematic into smooth support vector regression and construct the sliding windows smooth support vector regression(SW-SSVR) model, which can update learning samples and track the state change of the studied system in time, the SW-SSVR model is employed to address the [Si] prediction problem, which exhibits good performance with high percentage of successful trend prediction, competitive computational speed and timely online service; through the proposed binary coding SVM algorithm, a four-class problem, i.e., sharp descent, slight descent, sharp ascent and slight ascent of [Si], is reduced into two binary classifi-cation problems to solve, to heel, the four-class classification results can guide the blast furnace operators to determine the blast furnace control span together with the control direction in ad- vance; aiming at the prediction problem of [Si] change trend, MKL is employed to integrate heterogeneous data which improves the prediction accuracy, further more MKL is utilized to do feature reduction which is quite helpful for increasing the comprehensibility on explaining which variable is important for black box modeling.Peptide identification by tandem mass spectrometry(MS/MS) is another application is-sue of this thesis. Proteomics has become a hot subject in the post-genomic era. Peptide identification by MS/MS is widely used for high-throughput identification of proteins in com-plex biological samples. A flexible algorithm based on MKL SVM, named De-Noise, is pro-posed to transform the peptide identification problem into a special binary classification prob-lem. The De-Noise algorithm starts with the pre-process in which some of the noisy target PSM are eliminated from the target PSM dataset to provide more reliable training dataset. The noisy PSM are determined by computing their distance to the centroid of decoy PSM. Once the noisy target PSM are discarded from the original target PSM dataset in the data pre-process step, two rounds of refining processes are taken to distinguish the correct PSM from the incorrect PSM. At last, proteolytic information is integrated for validating PSM.We test the De-Noise algorithm on three data sets from multiple mass spectrometry platforms, Yeast(LCQ)ã€UPS1(LTQ)ã€Ta108(Orbit) and compared it with PeptideProphet and percolator. The performance of the De-Noise algorithm is shown to be superior on all data sets searched on sensitivity and spectificity. Thus, the De-Noise algorithm could be able to validate the database search results effectively.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ æ ¸å¦ä¹ ç®—æ³•ï¼› å¤šæ ¸å¦ä¹ ï¼› é«˜ç‚‰å†¶ç‚¼è¿‡ç¨‹ï¼› ä¸²è”è´¨è°±ï¼› å¤šè‚½é‰´å®šï¼›
ã€Key wordsã€‘ Kernel Learning Algorithmï¼› Multiple Kernel Learningï¼› Blast FurnaceIronmaking Processï¼› Tandem Mass Spectrometryï¼› Peptide Identificationï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ å¤§è¿žç†å·¥å¤§å¦

ã€åˆ†ç±»å·ã€‘TP181
ã€è¢«å¼•é¢‘æ¬¡ã€‘1
ã€ä¸‹è½½é¢‘æ¬¡ã€‘396
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽæ ¸çš„å­¦ä¹ ç®—æ³•ä¸Žåº”ç”¨

Kernel Based Learning Algorithm and Application

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

åŸºäºŽæ ¸çš„å¦ä¹ ç®—æ³•ä¸Žåº”ç”¨