节点文献

基于网络节点拓扑参数的关键蛋白质识别研究

On Identification of Essential Protein Based on Node’s Topological Parameters in Protein Networks

【作者】 黄海滨

【导师】 杨路明; 王建新;

【作者基本信息】 中南大学 , 计算机应用技术, 2008, 博士

【摘要】 蛋白质分子功能的重要性与它在蛋白质网络中对应节点的拓扑特性紧密相关。关键蛋白质的识别有助于从系统水平上理解生命活动的内在组织和过程,在疾病诊疗及药物设计等方面有重要的应用前景。与生物学实验方法及其它方法相比,基于拓扑结构的生物信息学方法在关键蛋白质识别上有独特优势。针对已有方法对关键蛋白质识别度不高的现状,认为进一步提高识别度有两条途径:一是发现与关键蛋白质关系更密切的参数,二是充分挖掘现有参数的信息并进行有效地整合。对于第一种途径,根据点覆盖在网络(图)拓扑结构上的重要地位而研究将其引入关键蛋白质的识别中;对于第二种途径,主要探讨复合参数的构造及异步识别方法,通过将多个参数所隐含的关键蛋白质信息进行有效整合而提高识别度。点覆盖问题虽然可以在参数计算理论的架构内求精确解,但是目前在理论及应用上有一定的局限性。将参数计算理论引进随机网络领域,利用随机网络统计和概率分布等特性,从全局和整体上分析并揭示参数化点覆盖问题低度(1度和2度)节点核化过程中问题的核及度分布演变的内在机制和变化规律。同时,根据核与节点度分布以及边的关系,提出随机网络参数化点覆盖问题的d-核化可决策性。在1度点核化的研究中,首先分析节点之间的映射关系,然后将它们的邻接关系进行量化,得出1度点核化算法对平均度为ω≤2.3的随机网络点覆盖问题的核化强度最高,同时指出它的d-核化(d=1)可决策性。在2度点核化的研究中,提出2度点三角形子网的计数方法;通过研究子网对节点的共享关系,分析2度点核化过程中核及度分布演变的动态过程,得出2度点核化算法对2度点分布概率在0.75左右的随机网络的核化强度最高,同时也指出它的d-核化(d=2)可决策性。初步结果表明,对随机网络点覆盖问题低度点核化过程的分析方法不但具有理论上的意义,而且随着问题随机度的大小而对问题有不同程度的把握能力,并提供了随机网络上这一NP完全问题的求解方法,也为参数计算在包括蛋白质网络在内的已知度分布的一类不确定问题中的应用提供了可能。对一给定的网络(图)来说,虽然最小点覆盖集的大小是一个固定值,但就一般情况而言可以求解出多个节点构成不同的最小覆盖集。为此,提出骨干点覆盖集、非骨干覆盖集及非覆盖集等概念,然后对蛋白质网络进行最小点覆盖分析并获得一种新的拓扑参数——点覆盖参数,从另一种角度描述节点的拓扑重要性。为了避开点覆盖参数精确求解方法中可能出现的NP-难问题,根据稀疏网络中存在大量的◇、Δ2、∧2子网的特点,将确定算法与非确定算法相结合,提出基于随机核化的快速算法(A-Q算法)。该算法通过引进参数计算的相关算法将复杂度大幅度降低,同时通过随机和统计方法使得到的结果尽可能接近实际解。结果显示,该算法得到的点覆盖参数与关键蛋白质有着密切的联系,在识别仿真上也表现出较好的性能,因此在描述节点的拓扑特性上具有重要意义。把关键蛋白质识别看作是一类特殊的模式识别。从相关分析出发对关键蛋白质与其主要拓扑参数的相互关系进行研究,发现参数对关键蛋白质识别能力的大小与两者之间的相关性有关;研究复合参数识别度与独立参数识别度、与独立参数相关性之间的关系,发现参数之间相关性的大小在很大程度上预示它们所蕴含的关键蛋白质信息之间互补性的强弱;根据上述发现,探讨利用包括点覆盖在内的各个参数的有限信息进行整合的方法,提出有效的复合参数构造方法及异步识别方法。实验结果证实,通过该技术获得的识别度明显高于其它识别技术。

【Abstract】 Recent research discovers that the essentiality of a protein molecule in function is correlated with its topological properties in a protein network. The study of essential protein (EP) is no only meaningful in the understanding of the organization and process of life activities, but also important in diagnosis-treatment and medicament design. Comparing with biological experiments and other methods, bioinformatics methods based on topological structure possess particular advantage in the identification of EP. However, there is still some dissatisfaction in the identification. To improve the performance, two ways are proposed in this paper: one is to find new parameters closer to EP, the other is to integrate the EP information from known parameters. For the former, the minimal vertex cover (MVC) is introduced to get a new topological parameter - vertex cover parameter (VP) for the first time. For the later, the combination of known parameters is explored.The exact solution to a minimal vertex cover problem (VCP) can be found within the frame of the theory of parameterized computation. However, there are still some limits in theory and practical. In this paper, the theory is introduced into the realm of random graph, so as to use statistic properties and probability methods to deal with a VCP wholly and uncover the inherence and evolvement laws of the kernel and the degree distribution in the kernelization by low (1 and 2) degree vertex (1-DV and 2-DV). On the base of fixed-parameter tractability, the d-decidable by way of kernelization (d-DBK) of the parameterized VCP of random graph is proposed.In the study of 1-DV kernelization (1-DVK), the mapping of 1-DV into the other nodes in a random graph is analyzed, and their adjacency relationship is quantified. It is found in this paper that the strength of 1-DVK gets its maximum whenω≤2.3 withωbeing the average degree of the graph, and the 1-DBK of the parameterized VCP of the graph is presented. In the study of 2-DV kernelization (2-DVK), the counting method for 2-DV triangle subnetworks is analyzed and the situation that a node is shared by more than one of the subnetworks is studied. After that, the dynamic and evolvement mechanism of the kernel and the degree distribution in the process of 2-DVK are described. Accordingly, two deductions are obtained: one is that the strength of 2-DVK gets its maximum in a random graph when the probability of its 2-DV is about 0.75, and the other is that the parameterized VCP (G, k) of a random graph given byφ(x) is 2-DBK when k smaller than a given value relation toφ(x).For its important role in the topology of a network, the minimal vertex cover (MVC) is introduced into the study of EP in a protein network and VP is proposed as a new topological parameter. To avoid the NP-hard which probable met in the calculation of the parameter by exact methods, the kernelization by low degree nodes and the combination of exact and no-exact algorithms are studied according to the sparseness of protein networks with lots of◇,△2 and∧2 subnetworks. Then, a quick algorithm (A-Q algorithm) based on randomized kernelization is presented. The result that the identifying degree of VP generated from the algorithm is better than those of the existed parameters obviously in the identification of EP, gives an evidence to approve the importance of the parameter in describing node’s topological characteristic.The identification of EP is considered as a special kind of pattern recognition by setting up the quantification of nodes’ (proteins’) relationship -topological parameters as its ground. The correlation between a protein’s essentiality and its main topological parameters, including VP, is analyzed and so is the nature of the essential-node-judgment of the parameters. In order to study the mutual complement of the EP information from different single parameters (SPs), the relation between the identification degree of a combined parameter (IDCP) and those of its SPs is presented theoretically, so is the relation between IDCP and the correlations of its SPs. With the above observation, the integration of the EP information from different SPs is explored and a combination method is proposed to get combined parameters. After that, an asynchronous recognition algorithm is developed, and the experiment results show that the identifying ability of the technique is greater than those of the others obviously in the identification of EP.

  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2010年 02期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络