节点文献

基于多分类器组合的蛋白质结构预测研究

【作者】 王海瑜

【导师】 潘泉; 张洪才;

【作者基本信息】 西北工业大学 , 系统工程, 2004, 硕士

【摘要】 随着人类基因组计划的顺利进展,越来越多的蛋白质序列被测定出来,利用理论计算方法来研究蛋白质的结构和功能从而指导实验是一项非常有意义的工作。本文从蛋白质的一级序列出发使用多分类器组合的一些算法对蛋白质结构进行分类研究,论文主要工作如下: 1、首先对蛋白质相关知识作了简要介绍,在研究蛋白质结构预测问题的过程中提出了使用多分类器组合算法对蛋白质结构进行分类研究。并在研究支持向量机等分类器和多分类器组合的基础上,对其类型、算法等进行了分析。 2、对蛋白子折叠子预测现状进行了研究,提出以支持向量机为基础的多分类器级联算法来解决折叠子分类问题,实验结果比直接分类提高了近四个百分点,证明了这种思路的有效性。 3、分析了多分类器融合算法的理论框架,并采用决策模板算法对蛋白质结构类的预测问题进行了研究。在此基础上对这种算法进行了三种改进,并设计了不同的实验来验证算法。实验结果均有不同程度的提高,这说明了算法改进的有效性,也表明将融合算法用于蛋白质结构类分类研究是一种比较可行的思路。 4、分析了多分类器选择算法的理论框架,对基于局部类精度的动态选择算法和聚类选择算法进行了说明,并将其运用到蛋白质同源寡聚体分类问题中。实验结果和使用支持向量机进行了比较,表明基于选择算法来预测蛋白质同源寡聚体,其精度和可靠性都优于后者。

【Abstract】 The success of human genome project makes the number of protein sequences entering into data bank rapidly increasing. Theoretical method computing for predict -ing the structure and function of protein and guiding the experiments is very significative work. In this thesis, we use several methods of multiple classifiers combination to classify protein structures based on the protein primary sequences. The main work is summarized as follow:1. The knowledge about protein is introduced firstly. During researching about the predicting the structure of protein, the methods of multiple classifiers combination are applied to classify the structure of protein. And we summarized the methods about multiple classifiers combination and the classifiers such as support vector machine based on the researches of lots of scholars.2. Researching about multi-class protein fold recognition, we use the cascade algorithms based on support vector machine to classify the folds. The total accuracy is nearly 4 percentile higher than direct-classifying. This result suggests the thought is feasible.3. We investigate the theoretical framework of multiple classifiers fusion, and apply the decision template algorithms to classify the protein secondary structural classes. Then three kinds of improving algorithms are proposed. We use the different experiments to validate them. The results of the experiments are better. It shows that the algorithms and the thought of using multiple classifiers fusion to solve classifying the protein structural classes are effective.4. Researching the theoretical framework for classifier selection, we explain the algorithms of dynamic classifier selection using local class accuracy estimates and that of clustering-and-selection and apply them to classify the homo-oligomeric of the protein. We compare with the prediction using support vector machine. The result of the experiments suggests that the precision and the reliability using the selection algorithms are better than those of using support vector machine.This thesis is endowed by the postgraduate carving out seed foundation of Northwestern Polytechnical University, No.Z20030048.

  • 【分类号】Q51
  • 【被引频次】7
  • 【下载频次】254
节点文献中: 

本文链接的文献网络图示:

本文的引文网络