节点文献

基于多尺度分析的人类胎儿脑发育过程的基因调控网络模型研究

Study of the Gene Regulatory Networks Model of Developing Human Fetal Brain Based on Multiscale Analysis

【作者】 罗万春

【导师】 易东;

【作者基本信息】 第三军医大学 , 流行病与卫生统计学, 2009, 博士

【摘要】 背景计算神经科学是使用数学分析和计算机模拟的方法在不同水平上对神经系统进行模拟和研究的科学,其主要内容之一就是探索神经系统中脑发育的过程及其分子机制。神经系统分中枢神经系统和周围神经系统,中枢神经系统包括脑和脊髓。脑的发育受到很多因素的影响,其中神经元内成千上万基因的调控起着关键的作用。研究表明,这些基因在以下三个层面上发挥着重要的调控功能:(1)大量基因按照特定的时空顺序表达,在胎儿发育的不同时期控制着不同脑区的发育;(2)这些基因分为若干簇团,它们彼此影响、相互调控,形成一个非常复杂的关联网络;(3)这些基因和基因簇团在时空上相互关联和作用,其最终结果是指导脑的正确发育。这三个层面上的基因调控关系,对应着基因信息在不同尺度空间上的复杂关联过程。因此,我们可以从多尺度分析的角度研究脑发育过程。本研究的前期工作已经获得了人类胎儿脑发育过程三个部位(小脑、大脑皮质、海马)的基因芯片数据,每个部位包含10080个基因、7个时间点的信息。本文根据这些基因表达的数据信息,利用多尺度分析和恰当的数学模型预测了这三个部位的基因簇团及其调控网络。方法本文的研究方法集中在以下几个方面。(1)放宽数据预处理的条件。按照基因表达有意义的基本要求和本研究的需要,只要每个部位的基因在7个时间点的ratio>0,我们认为该基因可能在基因调控过程中有作用,由此进行基因的初步筛选,并在进一步的模型求解中剔除与调控无关的基因。(2)建立基因芯片的y ~ n曲线数学模型。基因芯片上的基因及其表达水平之间没有构成一个稳定的函数关系,为此,我们创立了基因芯片的y ~ n曲线模型,并通过基因芯片降噪来评判y ~ n曲线模型的优越性,还进一步分析了该模型用于多尺度分析的作用及适用于各芯片数据的最佳小波函数。(3)构建多尺度聚类模型获得基因簇团。传统的基因数据分析用聚类将基因归类,我们是在y ~ n曲线的基础上进行多尺度分析,并在每个尺度聚类,只有当基因在各尺度上都被聚为同一类时,我们才将它们归为同一个基因簇团。(4)建立基于客观赋权的整数非线性规划模型预测基因簇团的调控网络模型。我们将基因簇团的表达均值作为该簇团的表达水平,将小脑、大脑皮质和海马的基因簇团分别作为一个整体,用熵对每个基因簇团进行客观赋权,并建立基于加权矩阵的整数非线性规划和相关系数模型进行结果比较,仅当结果一致才被认为调控关系成立。结果聚类分析用SPSS求解、优化模型用LINGO编程求解,其余模型均用MATLAB R2007a编程求解,得到的主要结果如下。(1)数据预处理后小脑组织有1153个基因、大脑皮质有956个基因、海马组织有1106个基因在7个时间点的表达强度。(2) y ~ n曲线用16种小波函数对基因数据进行降噪处理,得到小脑组织、大脑皮质、海马组织最好降噪效果的小波函数分别为Daubechies小波系列的db7、db4和db4,且y ~ n曲线对三个部位的基因芯片的信噪比均高于原始芯片数据。(3)在基因簇团的分析中,小脑组织共有402个基因构成34个基因簇团,大脑皮质有304个基因构成23个基因簇团,海马组织共有384个基因构成25个基因簇团。有49个基因同时属于小脑、大脑皮质和海马的基因簇团。(4)小脑组织中有30个基因簇团参与调控网络,共有39个正调控、5个负调控关系;大脑皮质中有15个基因簇团参与调控网络,共有个6正调控、11个负调控关系;海马组织中有16个基因簇团参与调控网络,共有20个正调控、5个负调控关系。(5)IFITM3、H2AFY、SSRP1、SCAP和CD59这5个基因同时参与了小脑、大脑皮质和海马的基因簇团调控网络。结论(1)利用y ~ n曲线不但可以更少地损失基因芯片的数据信息,还能有比原始数据更好的降噪效果,是基因芯片数据分析的一种有效手段。(2)多尺度聚类作为一种基因芯片的新型聚类方法,将对基因芯片处理方法带来一种有意义的创新,能更加准确地聚类,是一种有应用前景的统计方法,可以推广到其他领域。(3)人类胎儿大脑的发育和功能受到基因的调控,其基因簇团及其调控关系十分复杂但有研究价值。(4)本文建立的整数非线性规划模型比加权矩阵的权重选择更加客观合理,在人类胎儿脑发育过程的基因簇团的调控研究中得到比较理想的结果,该模型可以推广应用于其它组织、其他物种的基因调控网络研究。(5)IFITM3、H2AFY、SSRP1、SCAP和CD59这5个基因在人类胎儿脑发育过程的基因调控中起重要的作用,和疾病(II型高脂蛋白血、糖尿病引发的淋巴管病、前列腺瘤等)的发生有密切联系。

【Abstract】 BackgroundComputational neuroscience is a science using mathematical analysis and computer simulation at different levels to simulate and study the nervous system, one of whose main studies is to explore the process of brain development and its molecular mechanism in nervous system. Nervous system includes central nervous system and peripheral nervous system, and central nervous system includes brain and spinal cord. Development of the brain is affected by many factors, where the regulation of thousands of genes in neurons plays a key role. Studies have shown that these genes play important regulatory functions in the following three scales. (1) A large number of genes in accordance with specific spatial and temporal expression control the development of different brain regions in different fetal stages. (2) These genes are divided into some gene clusters, which are affected and regulated each other, and form very complex associated networks. (3) These genes and gene clusters are associated and affected in time and space, whose ultimate result is that the correct guidance of brain development. The gene regulations at three levels are corresponded to the complex associated process of genetic information in different scales. Therefore, we can study the process of brain development by multiscale analysis.We have got gene chip data in three parts of the developing human fetal brain (cerebellum, cerebral cortex and hippocampus) in the preliminary work of this study, each part of which contains information of 10,080 genes at seven point times. In this paper, we used multiscale analysis and appropriate mathematical models to predict the gene clusters and their regulatory networks in the three parts based on the information of these gene expression data.MethodsThe research methods of this thesis mostly focused on in the following areas.(1) Broadening the conditions for data preprocessing. According to the basic requirements of gene expression and the needs of this study, as long as the gene ratio> 0 at the seven time points, we believe that the gene may have a role in the gene regulatory process, which makes an preliminary gene selection and then we eliminate the genes which has nothing to do with the gene regulations by other models.(2) Establishment of gene chip model named y ~ n curve. There is no stable function between genes and their expressions in gene chip; therefore, we created a model y ~ n curve of gene chip, whose advantages were proved by using it to denoise gene chip data. We further analyzed the role of the model by using multiscale analysis and the best wavelet function of every gene chip.(3) Construction of multiscale cluster model to get gene clusters. One of traditional analysis methods of gene data is clustering. We made multiscale analysis on the basis of the y ~ n curve and clustered on each scale. Only when the genes on every scale were clustered into the same category, we incorporated them into a gene cluster.(4) Establishment of integer nonlinear programming model of regulatory networks based on objective weight to predict gene clusters. We took the average expression level of every cluster as its expression, and weighed every cluster with entropy, respectively taking the cerebellum, cerebral cortex and hippocampus as a whole. Then, we established the model of integer nonlinear programming based on weight matrix whose results were compared with that of the model of correlation coefficient. Only when their results were accordant, the regulatory relationships were admitted.ResultsWe used SPSS to solve cluster analysis, used LINGO to solve optimization model and used MATLAB R2007a to solve the other models.Then, we got the following main results.(1) There were gene expressions of 1153 genes in cerebellum, 956 genes in cerebral cortex and 1106 genes in hippocampus at 7 time points after data preprocessed.(2) When gene data were denoised by y ~ n curve with 16 wavelet functions, we got the best effect were respectively db7 in cerebellum, db4 in cerebral cortex and db4 in hippocampus, and all of the signal-to-noises of y ~ n curves in the three parts were higher than that of the original data.(3) In the gene cluster analysis, there are 34 clusters of 402 genes in cerebellum, 23 clusters of 304 genes in cerebral cortex, 25 clusters of 384 genes in hippocampus and 49 genes in the three parts. (4) There are 30 gene clusters to be concerned with regulatory network which has 39 relations of positive regulation and 5 relations of negative regulation in cerebellum, 15 gene clusters to be concerned with regulatory network which has 6 relations of positive regulation and 11 relations of negative regulation in cerebral cortex, and 16 gene clusters to be concerned with regulatory network which has 20 relations of positive regulation and 5 relations of negative regulation in hippocampus.(5) The five genes named IFITM3, H2AFY, SSRP1, SCAP and CD59 are participated in the gene clusters regulatory networks in cerebellum, cerebral cortex and hippocampus at the same time.Conclusion(1) The use of y ~ n curves can not only less lose of the information of gene chip data, but also has better denoising effect than the raw data, so it is an effective tool of gene chip data analysis.(2) As a new clustering method of gene chip, multiscale clustering will bring about a meaningful innovation, which can cluster more accurately, therefore it is a useful and novel statistical method, and can be extended to other areas.(3) The development and function of developing human fetal brain are regulated by genes, whose gene clusters and their regulations are very complex but valuable.(4) The nonlinear integer optimization model of this paper is more objective and reasonable in weight than matrix, which is used to study the regulation of gene clusters in the developing human fetal brain and good results have been got. The model can be used in the study of gene regulatory networks in other organizations and other species.(5) The five genes of IFITM3, H2AFY, SSRP1, SCAP and CD59 play a very important role in the gene regulatory process of the developing human fetal brain, which have been tested for association to diseases (hyperlipoproteinemia type II, diabetic angiopathies and prostatic neoplasms etc.).

【关键词】 多尺度分析基因调控基因簇团
【Key words】 multiscale analysisbraingene regulationgene cluster
节点文献中: 

本文链接的文献网络图示:

本文的引文网络