节点文献

基于序列从头预测法的蛋白质相互作用研究

Research on Protein-Protein Interactions Based on Primary Structure

【作者】 陆林英

【导师】 马志强;

【作者基本信息】 东北师范大学 , 计算机软件与理论, 2008, 硕士

【摘要】 蛋白质相互作用是细胞大部分功能的基础,直接关系着生物功能的多样性,它有两种主要的形式,包括“物理”上的相互作用和功能上的相互作用,一般的相互作用是指参与同一个代谢途径,具有相似的功能,也就是功能上的相互作用。蛋白质组学是在整体水平上研究蛋白质的结构、相互作用和功能的学科。相互作用连接着蛋白质的结构和功能,无疑是研究的热点和焦点。对蛋白质相互作用的研究人们已突破了试验的手段,而采取计算的方法对它作进一步的认证和高通量的预测,包括基于基因组方法、基于进化的方法和基于蛋白质序列的从头预测方法等。研究表明,基于基因组和进化的方法都各有其局限性,如基于基因组的方法需要知道全基因组的信息等。而基于蛋白质从头预测的方法它只需要知道蛋白质序列的主要结构,对于序列的长度等都没有限制,因而具有广泛的应用价值。本文利用蛋白质序列从头预测的方法识别相互作用的蛋白质,统计了蛋白质序列的多个特性,如氨基酸的疏水性、蛋白序列的摩尔分子量、极性以及平均隐蔽面积等。并应用BP神经网络和支持向量机(SVM)分类算法对蛋白质相互作用数据集进行了识别与比较。选取MIPS数据库中酿酒酵母(Scerevisiae yeast)相互作用数据集作为我们的标准数据集,其中包括阳性数据集4837对和阴性数据集9674对。实验表明,BP神经网络和SVM都具有较高的准确率,BP神经网络可达到87%以上的正确率并具有较高的敏感性,同时应用SVM的高斯核函数对本数据集也达到了64%以上的正确率,因而都可用于认证和预测由试验手段得到的蛋白质相互作用数据集。另外,通过实验的进一步分析,发现基于蛋白质序列从头预测法结合本文所用的分类算法能够有效的识别相互作用的蛋白质对。

【Abstract】 Proteins are probably the most important players in a living cell, a lot of functions of cell have been accomplished by protein interactions. There are stranger relationships between function various and protein-protein interactions, it has two mainly form, including“physical”interactions and function interactions. In general, interaction proteins participate in the same metabolic pathway, and executive same functions, in other words, interaction protein is function interactions.Proteomics is the systematic study of the structure, interactions and functions of protein. It is obviously that protein interaction is the most hot spot in proteomics. The experimental techniques for finding protein-protein interactions have several limitations which stimulated the research in computational way of predicting the interactions. It mainly includes genome, evolution information and based on primary structure of protein. But some of them have many limitations, for instance, the method of genome needs full genome information. However, the approach of protein primary structure, only requires the primary structure of protein, it has no limitations for sequence length and has great application.In this paper, we employ primary structure of protein to predict protein-protein interactions. The statistical method is used to generate sequence features, which are then normalized for satisfying experiments. Few features are calculated for each protein. It involves hydrophobility, molecular weight, polarity and average area buried. And BP neural network、SVM are used to classify two kinds of protein. We used the Scerevisiae yeast dataset to verify the predictive ability of our method, which including 4837 of interaction protein pairs and 9674 of non-interaction protein pairs. Achieving above 87% accuracy rates using 10-fold cross-validation based on BP neural network, and above 64% accuracy rates using SVM.In additional, the experiments manifest that our methods have a good ability to identify and predict interaction protein pairs.

  • 【分类号】TP29-AI
  • 【被引频次】4
  • 【下载频次】231
节点文献中: 

本文链接的文献网络图示:

本文的引文网络