节点文献

蛋白质二级结构预测准确率影响因素探讨

Explorations of the Influence Factors on Accuracies of Protein Secondary Structure Prediction

【作者】 闫蓬勃

【导师】 刘建国;

【作者基本信息】 河北大学 , 生物化学与分子生物学, 2009, 硕士

【摘要】 从蛋白质的一级序列得到其对应的三维结构是目前生物信息学领域重要的课题之一。计算机预测方法被广泛应用于蛋白质二级结构的研究,其发展过程大体分为两个阶段:第一个阶段以数理统计作为出发点,基于单个氨基酸信息,如Chou-Fasman和GOR (Garnier-Osguthorpe-Robson)方法;第二个阶段基于进化信息,主要利用BLAST等工具在序列数据库中对搜索序列进行多重比对以取得同源信息PSSM(特异位点打分矩阵)利用PSI-BLAST取得相应的进化信息PSSM。本实验致力于氨基酸特性对基于PSSM预测方法的改进和预测准确率的提高。以SVM(支持向量机)作为实现手段,在PSSM基础上分别添加疏水因子和HEC(螺旋、折叠、无规则卷曲)倾向性两种理化因子作为单个氨基酸的特征值对蛋白质二级结构进行预测。本实验还同时设计对SVM使用进行改进方法实现双层SVM,即通过理化因子和双层SVM工具两种方法共同达到提高蛋白质二级结构预测准确率的目的。实验结果经相关系数分析表明,添加的疏水因子和HEC倾向性对Q3微弱正相关,与SOV值显著正相关。它证明氨基酸的疏水性与HEC倾向性对蛋白质二级结构的形成起到一定作用。通过双层SVM实验,无论是准确率的绝对值还是相关系数分析,双层网络都在二级结构预测的准确率上占有优势,改进的SVM对其预测过程起到明显的优化作用。预测的准确率的Q3值和SOV比目前国际常用的PSSM方法分别提高了2.76%和1.25%。

【Abstract】 One of the most persistent problems in bioinformatics has been the unraveling of the protein primary structure to their unique tertiary structure. Most current protein secondary structure prediction programs employ multiple sequence alignments to capture local sequence patterns as input information for machine learning techniques. However, such local sequence patterns ignore the amino acids’intrinsic propensities for three states of the secondary structure, namely, n-helices,β-strands, or others (often referred to as coils). For this reason, we propose an approach to integrate the multiple sequence alignment profiles with amino acid propensities for machine learning input coding schemes. The position specific scoring matrices (PSSM) from PSI-BLAST were integrated with amino acid conformation parameters and hydrophobicity properties for protein secondary structure prediction with support vector machines (SVMs).The paper described SVM-based method with hydrohpobicity and HEC propensity with PSSM to predict protein secondary structure, which also used a two-layer SVM. The result analysis with correlative coefficient showed that the hydrophobicity and the HEC propensity had little relationship with the Q3 results but they had obviously relationship with their SOV results. The two-layer SVM technique showed improvement on both Q3 and SOV. The integrated method increased Q3 and SOV by 2.76% and 1.25% respectively.

  • 【网络出版投稿人】 河北大学
  • 【网络出版年期】2012年 02期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络