èŠ‚ç‚¹æ–‡çŒ®

ç”Ÿç‰©åºåˆ—çš„ç›¸å¯¹ç‰¹å¾åˆ†æžåŠBurrows-Wheeleræ–¹æ³•

Relative Character Analysis and Burrows-Wheeler Methods for the Biological Sequence

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ æ¨è¿žå¹³ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ å¤§è¿žç†å·¥å¤§å¦ ï¼Œ åº”ç”¨æ•°å¦ï¼Œ 2011ï¼Œ åšå£«

ã€æ‘˜è¦ã€‘ éšç€åŽåŸºå› ç»„æ—¶ä»£çš„åˆ°æ¥,é¢å¯¹ç€å¤§é‡çš„åŸºå› ç»„çš„å®Œå…¨æµ‹åºåŠå„ç§é—®é¢˜çš„æ¶ŒçŽ°,äººä»¬æœŸæœ›ä½Žæˆæœ¬çš„åºåˆ—æ¯”è¾ƒåˆ†æžå·¥å…·èƒ½å¤Ÿæ›´ç²¾å‡†ã€æ›´å¿«é€Ÿçš„åˆ†æžå’Œé¢„æµ‹åºåˆ—çš„ç»“æž„ä¸ŽåŠŸèƒ½,ä»Žè€Œé™ä½Žç”¨å®žéªŒæ–¹æ³•æµ‹å®šä¸Žåˆ†æžè€Œå¸¦æ¥çš„é«˜é¢æ—¶é—´ä¸Žé‡‘é’±æˆæœ¬ã€‚æœ¬æ–‡è‡´åŠ›äºŽç”Ÿç‰©åºåˆ—åˆ†æžçš„ç ”ç©¶é¢†åŸŸ,æå‡ºå…·æœ‰ä¸€å®šç‰¹è‰²çš„æ¯”è¾ƒåˆ†æžæ¨¡åž‹ã€‚é€šå¸¸,åºåˆ—çš„æ¯”è¾ƒåˆ†æžä¸»è¦è¢«åˆ†æˆä¸¤ç±»æ¨¡åž‹ï¼šæ¯”å¯¹æ¨¡åž‹å’Œéžæ¯”å¯¹æ¨¡åž‹ã€‚æœ¬æ–‡ä»Žæ¯”è¾ƒåˆ†æžæµç¨‹çš„æ‹“æ‰‘æ¡†æž¶ä¸Šçœ‹å¾…å„ç§æ¯”è¾ƒæ¨¡åž‹,æå‡ºå°†æ¯”è¾ƒåˆ†æžæ¨¡åž‹åˆ†ä¸ºç‰¹å¾åˆ†æžæ¨¡åž‹åŠç›¸å¯¹ç‰¹å¾åˆ†æžæ¨¡åž‹ã€‚æ¯”å¯¹æ¨¡åž‹åŠåŸºäºŽä¿¡æ¯åŽ‹ç¼©çš„æ¯”è¾ƒæ¨¡åž‹éƒ½å±žäºŽç›¸å¯¹ç‰¹å¾åˆ†æžæ¨¡åž‹ã€‚åœ¨ç›¸å¯¹ç‰¹å¾åˆ†æžæ¨¡åž‹ä¸,ç›¸ä¼¼æ€§å‡è®¾æ˜¯è¿™ç±»æ¯”è¾ƒæ¨¡åž‹çš„ä¸€ä¸ªæ ¸å¿ƒå†…å®¹ã€‚é€šè¿‡åˆ†æžç›¸ä¼¼æ€§å‡è®¾å¯ä»¥å¾—å‡ºè¯¥æ¨¡åž‹çš„ä¸»è¦çš„ä¼˜ç¼ºç‚¹ã€‚æœ¬æ–‡é‡ç‚¹ç ”ç©¶è®¨è®ºäº†ä¸¤ç±»ç›¸å¯¹ç‰¹å¾åˆ†æžæ¨¡åž‹ï¼šåŸºäºŽåºåˆ—é—´å…¬å…±åä¸²çš„æ¯”è¾ƒæ¨¡åž‹å’Œâ€™Burrows-Wheeleræ–¹æ³•ã€‚æœ¬æ–‡æå‡ºçš„åŸºäºŽå…¬å…±åä¸²çš„æ¯”è¾ƒæ¨¡åž‹æ˜¯é€šè¿‡è®¨è®ºæœ€é•¿å…¬å…±åä¸²ä¸Žæœ€çŸç‰¹å¼‚åä¸²ä¹‹é—´çš„å…³ç³»è€Œå¾—å‡ºçš„ä¸€ç§æ¨¡åž‹ã€‚å…¶ä¸»è¦ç‰¹ç‚¹æ˜¯ï¼šç®—æ³•çš„æ—¶é—´å¤æ‚åº¦ä¸ºçº¿æ€§çš„,ä»Žè€Œé€‚åˆåˆ†æžå¾ˆé•¿çš„åŸºå› ç»„ï¼›å…¶ä¸çš„å±€éƒ¨è·ç¦»åº¦é‡å¯ä»¥è¾ƒå¥½çš„åˆ†æžåŸºå› ç»„é—´çš„å±€éƒ¨ç›¸ä¼¼æ€§,å³ä½¿æ‰€è€ƒè™‘çš„å±€éƒ¨åŒ…å«äº†éƒ¨åˆ†ç‰‡æ®µçš„é‡ç»„ä¿¡æ¯ï¼›æ ¹æ®å±€éƒ¨è·ç¦»åº¦é‡è€Œå¾—å‡ºç´¯ç§¯å±€éƒ¨è·ç¦»ä¹Ÿèƒ½æœ‰æ•ˆçš„åˆ†æžåŸºå› ç»„çš„æ•´ä½“ç›¸ä¼¼æ€§ã€‚é€šè¿‡å¯¹HIV-1å…¨åŸºå› ç»„åŠå…¶ç‰‡æ®µçš„ååž‹åˆ¤åˆ«çš„é—®é¢˜çš„ç ”ç©¶,æˆ‘ä»¬éªŒè¯äº†è¯¥æ¨¡åž‹çš„æœ‰æ•ˆæ€§ã€‚Burrows-Wheeleræ–¹æ³•æ˜¯å¦ä¸€ç±»æœ¬æ–‡é‡ç‚¹ç ”ç©¶è®¨è®ºçš„ç›¸å¯¹ç‰¹å¾åˆ†æžæ¨¡åž‹ã€‚å…¶ç†è®ºä¸»è¦åŸºäºŽä¿¡æ¯æ— æŸåŽ‹ç¼©ç†è®ºä¸çš„ä¸€ä¸ªé‡è¦çš„å¯é€†å˜æ¢â€”â€”Burrows-Wheelerå˜æ¢ã€‚åœ¨æ¤å˜æ¢çš„åŸºç¡€ä¸Šè€Œå¾—å‡ºçš„æ‰©å±•Burrows-Wheelerå˜æ¢å¯ä»¥æœ‰æ•ˆçš„åˆ†æžåºåˆ—é—´çš„å…±æœ‰å› åçš„å«é‡ã€‚æœ¬æ–‡æå‡ºäº†ä¸€ç§ç§°ä¸ºBurrows-Wheelerç›¸ä¼¼æ€§åˆ†å¸ƒçš„æ¦‚å¿µ,å¹¶ç”¨å…¶æ¥æè¿°åºåˆ—é—´çš„ç›¸ä¼¼æ€§ã€‚åœ¨æ¤åŸºç¡€ä¸Š,æˆ‘ä»¬æå–Burrows-Wheelerç›¸ä¼¼æ€§åˆ†å¸ƒçš„ä¸¤ç±»æ•°å—ç‰¹å¾â€”â€”æœŸæœ›å’Œä¿¡æ¯ç†µ,å¹¶é’ˆå¯¹åŸºå› åºåˆ—ã€è›‹ç™½è´¨åºåˆ—åŠå…¶ç»“æž„åºåˆ—çš„ç‰¹ç‚¹,é‡‡ç”¨ä¸åŒçš„ç–ç•¥æ¯”è¾ƒå®ƒä»¬ä¹‹é—´çš„ç›¸ä¼¼æ€§ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ As the coming of the post-genome period, we have to face up to the vast complete genomes and kinds of questions. The inexpensive sequence analysis tools are expected to be faster and more accurate to analyze and predict the structure and the function of the biological sequences, which can reduce the high cost of time and money by the experimental methods. In this dissertation, we focus on the field of the biological sequence analysis and propose some models with great value.Traditionally, there are two kinds of sequence analysis tools:alignment and alignment free models. However, we point out that the models fall into two categories by the topology structure of the basic comparison frames:one is character analysis and the other is relative character analysis. Models based on alignment or based on text compression are all relative character analysis models. We find that the core of the relative character models is the hypothesis of the similarity. We will find the main merit and demerit by the hypothesis of the similarity.The discussion topics of this dissertation are two kinds of relative character comparison models which are based on common strings and Burrows-Wheeler method respectively. The common string model is designed through investigating the relationship between the longest common strings and the shortest absent words. The advantages of this model are:the time complexity is linear which is perfect to analyze the huge genomes; the local distance measure derived by this model can be used to search the similar parts between the genomes, even though the local parts take some gene recombination information in; the local distance deduce the integral local distance easily which can be used to analyze the integral similarity efficiently. The validity is confirmed by classifying the subtype of the complete genomes and their segments of the HIV-1.Burrows-Wheeler methods are another kind of relative character methods. The essential foundation is the invertible Burrows-Wheeler transformation which has important applications in the field of the lossless compression. The extensive Burrows-Wheeler transformation is the key generalization for the comparison frame, which can detect the content of the common factors between the biological sequences. We propose a concept called Burrows-Wheeler similarity distribution to represent the similarity of the sequences. Moreover, some digit characteristics, expectation and entropy, are computed to compare kinds of biological sequences with different strategies chosen by the feature of the gene, protein or the structure sequences.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ åºåˆ—åˆ†æžï¼› è®¡ç®—ç”Ÿç‰©å¦ï¼› ç›¸å¯¹ç‰¹å¾ï¼› å…¬å…±åä¸²ï¼› å±€éƒ¨è·ç¦»ï¼› Burrows-Wheeleræ–¹æ³•ï¼›
ã€Key wordsã€‘ Sequence analysisï¼› Computational Biologyï¼› Relative characterï¼› Common stringï¼› Local Distanceï¼› Burrows-Wheeler methodï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ å¤§è¿žç†å·¥å¤§å¦

ã€åˆ†ç±»å·ã€‘Q811.4;O242.1
ã€ä¸‹è½½é¢‘æ¬¡ã€‘189
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

ç”Ÿç‰©åºåˆ—çš„ç›¸å¯¹ç‰¹å¾åˆ†æžåŠBurrows-Wheeleræ–¹æ³•

Relative Character Analysis and Burrows-Wheeler Methods for the Biological Sequence

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

ç”Ÿç‰©åºåˆ—çš„ç›¸å¯¹ç‰¹å¾åˆ†æžåŠBurrows-Wheeleræ–¹æ³•