节点文献

基于量子算法的苹果及PCD相关蛋白亚细胞定位体系研究

The Study of Protein Subcellular Localization of Malus×Domestica Borkh. and PCD Based oil Quantum Algorithm

【作者】 刘智新

【导师】 杨洪强;

【作者基本信息】 山东农业大学 , 果树学, 2013, 博士

【摘要】 动物、植物等真核生物的蛋白质学、蛋白质组学方面的研究在后基因组时代变得越来越重要,由于多种生物(包括果树中的苹果、葡萄在内)基因测序工程的完成,研究的重心开始向确定基因的蛋白质产物功能方向移动。果树蛋白质亚细胞定位是果树蛋白质组学、果树细胞生物学和果树分子生物信息学的重要研究内容。果树蛋白质分子生物功能的实现一方面与代谢、信号传导等果树生物过程紧密相关,另一方面果树蛋白质分子必须处于特定的亚细胞区域才能行使其生物功能。对于未知功能的果树蛋白质获取其亚细胞的位置信息对进一步研究该蛋白质的分子功能是十分必要的。通过生物实验技术手段获取一个果树蛋白质亚细胞定位信息是通常的做法,但是这种做法消耗时间较长且实验成本较高,同时由于果树蛋白质序列的快速增长,在短时间内获取规模化蛋白质亚细胞定位信息(例如:苹果全基因组蛋白质亚细胞定位信息)只能依靠生物信息技术手段来完成。另一方面,从生物数据角度来看生物信息学主要可以分为三个研究领域:大量生物序列数据的生成与管理、生物数据的使用与分析、生物数据分析平台工具的研究与开发。由于生物信息数据大量的产生以及生命科学研究的迅猛发展,无论是从科学研究还是生产实践,人们急需能满足需求的生物数据分析平台工具,在一些研究课题中生物数据分析平台工具甚至成为制约深入研究的瓶颈问题。同时,由于生物数据分析平台工具研究与开发往往需要来自生物、数学、物理、化学、信息科学等多领域的知识,这也增加生物数据分析平台工具研究与开发的复杂性。所以在果树生物数据分析平台工具方面开展深入的研究是十分必要的,并且也具有重要的实践应用价值,这也是我们研究工作的目的之一。本文以量子算法为主,针对PCD相关蛋白质亚细胞定位预测中的生物信息技术问题和苹果蛋白质亚细胞定位预测的实现问题进行了深入的分析与研究,结合生物物理和物理的知识,提出了具体的解决办法和实现方案。本文的主要工作和创新之处概括如下:1.从蛋白质氨基酸序列的组成出发,利用物理学中粒度的思想,提出了蛋白质氨基酸序列的粒度概念,结合具体氨基酸序列片段详细阐述了蛋白粒度的构成。使用蛋白粒度对氨基酸序列进行分析,进一步给出了蛋白粒度的阶、蛋白粒度的界、蛋白粒度的极限、蛋白粒度增量等概念。在深入的研究时发现:蛋白粒度沿序列不均匀分布;每条蛋白序列都有各自的蛋白粒度的极限;对于所有蛋白来讲,蛋白的各阶粒度都有共同的界。如果从蛋白预测的应用来讲,还可以得出:蛋白粒度包含了氨基酸序列的组成信息,包含了氨基酸序列的排列信息,还包含了同种氨基酸的互邻信息,同时蛋白粒度增量自然包含了氨基酸序列的长度信息。对于如何利用蛋白粒度的理论和知识来构造蛋白序列的特征向量,本文给出了一种具体的构造方法并对有关参数进行了详细的说明。根据蛋白粒度增量的信息对标准数据集的蛋白质二级结构类以及植物蛋白亚叶绿体定位进行预测,得到比前人更好的结果,这些进一步说明了蛋白粒度是反映蛋白质属性的非常有用的指标。2.选择ZD98、ZW225、CL317凋亡蛋白标准数据集,利用蛋白粒度对凋亡蛋白序列进行特征提取,得到38维蛋白序列特征向量,对量子神经网络算法(QNN)进行改进后,对凋亡蛋白进行亚细胞定位预测,分别获得了87.8%、83.1%、85.5%的总体预测精度,这些预测精度等于或高于原作者的预测精度,说明蛋白粒度与QNN结合的方法在凋亡蛋白亚细胞定位预测上是有效的。3.利用已经公布的苹果全基因组蛋白序列,对苹果全基因组蛋白序列进行粒度等特征提取,分别得到苹果全基因组蛋白二阶粒度组成、三阶粒度组成、粒度多空间融合等特征向量,然后根据量子力学中波函数的叠加思想研制了新的量子算法(QSVM),对苹果全基因组蛋白63541条氨基酸序列进行了亚细胞定位预测,获得了相应的定位信息,并形成了苹果全基因组蛋白亚细胞位点数据库1。4.在Chou构造的一个高质量的植物蛋白细胞多定位基准数据集的基础之上,本文提出分别处理的预测模式,对多标签蛋白和单标签蛋白分别进行预测,同时利用GO注释对蛋白序列进行特征提取,取得了较高的预测精度,为蛋白的多定位预测找到了一种新的方法。5.在苹果全基因组蛋白数据集的基础上,对有GO注释的苹果蛋白进行了GO注释特征提取,结合本文提出的蛋白粒度的有关理论和知识,再进行蛋白粒度特征提取,研制了新的量子算法(SQSVM),对在苹果全基因组上筛选出来的含GO注释的15297条蛋白氨基酸序列进行了亚细胞定位预测,给出了相应的定位结果,在此基础之上构建了苹果全基因组蛋白亚细胞位点数据库2。6.作为生物数据分析平台具体体现的亚细胞定位网站--苹果蛋白亚细胞定位系统网站和植物蛋白亚细胞多定位系统网站的建设,利用本文有关的研究结论,现已完成。即将开通,对中外免费提供服务。

【Abstract】 Protein and proteomics research of animals, plants and other eukaryotes is becomingincreasingly important in the post-genomic era. Due to the completion of a variety ofbiological gene sequencing project including apple, grape of fruit trees, the focus of researchbegan to move direction to determine the gene function of the protein product. Proteinsubcellular localization of fruit trees is an important research content of proteomics, cellbiology and molecular bioinformatics of fruit trees. The realization of the biological functionof the fruit protein molecules is closely related to the biological processes of metabolism,signal transduction, and so on. On the other hand, the fruit protein molecules must be in aspecific subcellular region to exercise its biological function. For further study of themolecular function of this protein is essential to obtain the position information of itssubcellular for unknown function fruit protein. Protein subcellular localization information ofa fruit obtained through biological experimental techniques is the usual practice, but thispractice a longer time-consuming and high cost of experiments. For large-scale proteinsubcellular localization information in a short period of time due to the rapid growth ofprotein sequences of the fruit trees (for example: Apple protein genome-wide protein cellularlocalization information), can only rely on bio-information technology means to accomplish.On the other hand, bioinformatics research can be divided into three areas from theperspective of biological data: the generation and management of a large number of biologicalsequence data, the use and analysis of biological data, and the research and development oftool for biological data analysis platform. Since the generation of a large number ofbioinformatics data and the rapid development of life sciences, either from research orproduction practice, need tool of biological data analysis platform which meet the demand ofpeople. In some research tool of biological data analysis platform even become a bottleneckrestricting depth study of the problem. Meanwhile, the research and development of tool ofbiological data analysis platform often need knowledge which come from biology,mathematics, physics, chemistry, information science and other areas, which also increasesthe complexity of the research and development of tool for biological data analysis platform.Therefore, it is necessary to carry out in-depth research in tool for biological data analysisplatform of fruit trees. It also has important practical application value, which is one of thepurposes of our study. In this paper, based on quantum algorithms, the issues of biological informationtechnology of PCD protein subcellular localization prediction and the realization of appleprotein subcellular localization prediction are conducted in-depth analysis and research.Combined biophysical and physical knowledge, and specific solutions and implementationare put forward. The main work and innovation are summarized as follows:1. The departure from composition of protein amino acid sequence and the use of the ideaof physical granularity, the concept of granularity of amino acid sequence of protein isproposed. The amino acid sequence of protein is analyzed by protein granularity. Theconcepts of protein granularity order, protein granularity bound, protein granularity limit, andprotein granularity increment are given respectively. And we found some useful phenomenon:protein granularity is uneven distribution along the sequence of protein; each protein sequencehas its own protein granularity limit; for all protein, each protein granularity has a commonbound. In terms of the predictable application of protein, it also can be drawn: proteingranularity include the amino acid composition information, the sequence-order information,the same amino acid ‘neighbor’ information, and the sequence length information. In thispaper, a concrete construction method and related parameters are described in detail for howto use the theory and knowledge of protein to construct feature vectors of the proteinsequences. According to the information of protein granularity increment, standard data setsof protein secondary structure classes and sub-chloroplast localization of plant protein havebeen predicted. The better results than their predecessors are obtained, which further illustratethe protein granularity is useful indicator reflects the protein attribute.2. The ZD98, ZW225and CL317apoptotic protein standard datasets are selected. Usingprotein granularity to extract apoptotic protein sequence feature, the38-dimensional proteinsequence feature vector is obtained. The apoptotic protein subcellular localization predictionis conducted by improved quantum algorithm (QNN). The overall prediction accuracyachieved87.8%,83.1%and85.5%, respectively. The prediction accuracy is equal to or higherthan the prediction accuracy of the original author, indicating that protein granularity methodcombined with QNN for apoptotic protein subcellular localization prediction is valid.3. Based on the apple genome-wide protein sequences which have been published, proteingranularity feature vectors of the apple genome-wide protein sequences are obtained. Featurevectors of the apple genome-wide protein sequences such as second-order protein granularitycomposition, third-order protein granularity composition and integration of multi-granularityspace are obtained. Then according to wave function superposition of quantum mechanics, anew quantum algorithms (QSVM) is developed.The protein subcellular localization prediction of63,541amino acid sequences of the apple genome-wide proteins have been conducted. Thecorresponding results of the protein subcellular localization prediction are presented. Theapple genome-wide protein subcellular sites database1is obtained.4. A high-quality plant protein dataset of protein multi-localization constructed by Chou isselected. In this paper, the respectively processed prediction mode is presented and the multi-tagged protein and single-tagged protein are predicted respectively. At the same time the GOannotations are used for feature extraction of protein sequences. The predictions achievehigher prediction accuracy and find a new protein localization prediction method.5. Based on the apple genome-wide protein datasets, the GO annotations are used forfeature extraction of the apple protein sequences which have the GO annotations.A newquantum algorithms (SQSVM) combined with the proposed theory and knowledge of theprotein granularity, the protein subcellular localization prediction of15297amino acidsequences of the apple genome-wide proteins which have the GO annotations have beenconducted. The corresponding results of the protein subcellular localization prediction arepresented. On this basis, the apple genome-wide protein subcellular sites database2isconstructed.6. Based on the conclusions of this paper, as subcellular localization websites ofbiological data analysis platform-apple protein subcellular localization system and plantprotein subcellular multi-localization have been built. Websites will be launched to providefree services for Chinese and foreigners.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络