节点文献

色谱指纹图谱的智能聚类分析在中医湿证辨别方面的研究

Research on Clustering Analysis Based on Fingerprint Chromatogram & Application in Discrimination between Disease and Damp-syndrome in Traditional Chinese Medicine

【作者】 胡琳

【导师】 黄振国;

【作者基本信息】 东华大学 , 控制理论与控制工程, 2004, 硕士

【摘要】 用中医来进行是否有病以及是否有湿证的辨别一直是中医学界所研究和探讨的话题。本文研究健康人、湿证病人和非湿证病人的新鲜尿液的色谱指纹图谱,对这些图谱进行了一系列的研究工作,并取得了一定的成果。 本文首先对色谱法原理及其特点进行探究,根据分析化学中常用的色谱指纹图谱来建立数学模型的方法来分析共有峰与重叠率以及n强峰的实际意义。 其次,本文针对聚类分析的各种算法进行了研究和对比分析。现有的聚类分析算法可划分为:划分方法、层次的方法、基于密度的方法、基于网格的方法和基于模型的方法。 划分方法:给定一个n个对象的数据库,一个划分方法构建数据的k个划分,每个划分表示一个簇,并且k≤n,如k-平均法,k-中心点算法,它对小数据库有效,计算复杂度为O(n~2)。 层次的方法:对给定数据对象集合进行层次分解。根据层次的分解如何形成,层次的方法又分为凝聚的和分裂的方法,如BIRCH算法。其计算其复杂度为O(n)。 基于密度的方法的主要思想是:只要邻近区域的密度(对象或数据点的数目)超过某个阈值,就继续聚类。这种方法可以用来过滤“噪声”孤立点数据,发现任意形状的簇。如DBSCAN算法,如果用空间索引,DBSCAN的计算复杂度是O(nlogn),否则计算复杂度为O(n~2)。 基于网格的方法:把对象空间量化为有限数目的单元,形成了一个网格结构。所有的聚类操作都在这个网格结构上进行。这种方法的主要优点是处理速度快,其处理时间独立与数据对象的数目,只与量化空间中每一维的单元数目有关。如STING算法,产生聚类的时间复杂度为O(n),但查询处理时间是O(g),g是最低层网格单元的数目,通常g远远小于n。 色谱指纹谱的智能聚类分析在中医湿证辨别方面的研究 基于模型的方法:为每个簇假定了一个模型,寻找数据对给定模型的最佳拟合。如COB场王B,计算复杂度会因输入属性的数目和属性值的不同而剧烈变化。 基于模糊集的聚类分析:如模糊聚类的最大树法。 再次。本次研究利用n强峰、共有峰的重叠率和向量夹角正余弦值对样品色谱指纹图谱分别建立了相似度矩阵、相异性矩阵或相似度表,以这些数据模型为基础,分别用了k-平均、模糊聚类的最大树法和改进的COBWEB法进行了聚类研究,得到了不同的效果。其中改进的COB场吧B法利用共有峰的重叠率作为类内相似性(P(再二玲!q)),把谱峰向量夹角的正弦值作为类间相异性(P(再=玲),在处理谱峰数据过程中,减少或剔除了所有样品中共有峰中占总峰面积的较大面积的谱峰在聚类中的权重,以放大大部分相异成分在分类时的比重。通过比较COBWEB法取得了较好的效果。 最后,通过VC++实现聚类算法。同时提出了改善样本采集方法和改进聚类的方法以进一步提高聚类分析在中医辨别有病无病、湿证与非湿证的应用水平。

【Abstract】 Discrimination on disease and damp-syndrome is the topic which is researched and discussed all the times in the traditional Chinese medicine field. Fingerprint chromatograms of fresh urine derived from healthy people, patients with damp-syndrome and patients with non-dampsyndrome are researched in this paper. And a series of studies are performed on these fingerprint chromatograms and certain achievements have been gamed.First, theory and characteristics of chromatography are explored in the paper. The method based on math model, which is established according to fingerprint chromatogram, is used to analyze the actual significance of common peak, overlap rate and n-strong peaks.Secondly, some methods on cluster analysis have been studied and analyzed. And the methods can be divided into portioning method, hierarchical method, density-based method, grid-based method and model-based method.Portioning method: Construct a partition of a database D of n objects into a set of k clusters and each portioning means a cluster with the time complexity of O (n2), where k<n, such as k-means algorithm , k-medoids algorithm.Hierarchical method: Create a hierarchical decomposition of the set of data objects.. And this method can be divided into agglomerative and divisive hierarchical method with the time complexity of O (n) according to the decomposition process, e.g. BIRCH algorithm.Density-based method: If the density of neighborhood, that is the number of data objects, exceeds a certain value, the clustering process will be continued. The method can be used to filtrate the outlier data and discover clusters of arbitrary shape. As to DBS CAN algorithm, if the spatial index is used, the tune complexity is O(nlogn), oritisO(n2).Grid-based method: Change the objects into the cell with limited number and construct a grid structure. All the clustering operation should be done on the grid structure. The advantage of the method is that the time complexity is independent of the number of objects, and is relevant with the number of cells of each dimension in the measured space. As to the STING algorithm, the tune complexity of clustering isO(n), but the time complexity of query is O(g), where g is the number of grid cells at the lowest level and g is far smaller than n..Model-based method: Suppose some mathematical models for each cluster, and attempt to optimize the fit between the data and some mathematical model. The time complexity will be different according to the number and value of input properties, such as COBWEB algorithm.Thirdly, similarity matrix, dissimilarity matrix or similarity table are established based on the n-strong peaks, the overlap rate of common peaks and the cosine/sine of vectors’ angle which are derived from the fingerprint chromatograms of samples. And based on these data model, clustering research has been done by k-means algorithm, biggest tree in fuzzy clustering and improved COBWEB algorithm, where different results have been gained. By comparing, COBWEB algorithm is the best. In the improved COBWEB algorithm, the overlap rate ofcommon peaks has been regarded as intra-class similarity (P(Ai = Vij|Ck)) whilethe sine of vectors’ angle has been regarded as inter-class similarity (P(Ai= Vij)). Inaddition, the weightiness of common peaks whose area are quite high in the total area of all peaks has been reduced or eliminated so that the proportion of most dissimilar ingredients can be magnified.Finally, these clustering method are achieved by VC++. And at the same time, the way to collect samples and the method to improve clustering have been brought forward so that the application level of clustering analysis to discrimination on disease and damp-syndrome in the traditional Chinese medicine can be improved.Hulin(Control theory and Control engineering) Supervised by Shao Yuexiang

  • 【网络出版投稿人】 东华大学
  • 【网络出版年期】2004年 03期
  • 【分类号】TP391.4
  • 【被引频次】1
  • 【下载频次】154
节点文献中: 

本文链接的文献网络图示:

本文的引文网络