节点文献
关于马田系统若干问题的研究
Research of Mahalanobis-Taguchi System Theroy
【作者】 陈湘来;
【导师】 韩之俊;
【作者基本信息】 南京理工大学 , 管理科学与工程, 2008, 博士
【副题名】以医疗数据为例
【摘要】 马田系统(Mahalanobis-Taguchi System,MTS)是由日本著名质量工程学家田口玄一博士首先提出的一种新的模式识别方法,它以基于马氏距离(MahalanobisDistance,MD)的信噪比(Signal to Noise Ratio,RSN)为优化指标,应用二水平正交表进行有效特征的选择,通过样品的马氏距离达到数据分类与判别分析的目的。目前,国际上马田系统的应用领域已经非常广泛,创造了巨大的经济效益和社会效益。但是,在我国,关于马田系统的理论与应用研究才刚刚起步,研究基础还相当薄弱。本论文的研究思路为:首先,系统回顾马田系统理论和应用的国内外最新进展和研究;其次,在对马田系统进行深入分析的基础之上,通过对距离统计量与相似系数统计量的整合,构造一个新的类别可分性指标,使之既能反映样品之间的距离贴近程度,同时又能反映样品之间的形状相似程度;第三,通过计算每个特征变量的熵值,Ⅰ类分析特征变量的有效性,并通过模糊聚类的分析方法,对特征变量进行模糊聚类分析,使得相似的特征变量归为一类,从而达到识别Ⅱ类特征变量的目的;第四,根据分类类型的不同(有序分割类型与一般分割类型),分别通过3σ准则与扰动模糊分析方法,将两类判别的状况发展至多类判别,给出一般意义上的多类判别准则;最后,运用马田系统进行疾病诊断的研究,运用理论指导实践,并为我国的疾病诊断提供新的技术和方法。本论文主要研究内容及结论有:1)距离统计量的比较研究经典马田系统中,类别可分性指标采用的是马氏距离统计量。从理论上讲,相比较其他距离测度,马氏距离有着比较科学的内涵:考虑到相关性、量纲的影响以及线性变换不变性等。试验也表明,马氏距离具有更好的判别效果。2)类别可分性指标的拓展研究在经典的马田系统中,所用类别可分性指标为距离统计量。应用距离统计量作为类别可分性指标,虽然能够有效地识别出样本之间的距离贴近程度,但却不能准确地衡量样本之间的形状相似程度。在某些识别场合下,样本之间的形状相似较之距离贴近显得更为重要。因此,本论文对经典马田系统的类别可分性指标进行拓展,整合了距离统计量与相似系数统计量,构建了样本近似度统计量,使之既能够体现样本之间的距离贴近程度,也能够衡量样本之间的形状相似程度。3)特征变量选择方法的研究经典马田系统采用正交表与信噪比的方法来判定特征变量有效性。这种方法不仅计算繁杂,而且随着特征变量的增多,计算量也随之成倍增大。本论文根据田口玄一的基于数据分析的思想,讨论了熵值法在特征变量优化选择问题中的应用,阐明了应用熵值法进行特征变量选择的基本原理与计算过程,并通过实际的应用算例表明熵值法的有效性。利用二水平正交表法与熵值原理法能够剔出掉那些对识别效果起负作用或者基本不起作用的特征变量(Ⅰ类特征变量),但是却难以识别出那些对最终识别效果起相似作用的特征变量(Ⅱ类特征变量)。笔者通过模糊聚类的分析方法,对特征变量进行模糊聚类分析,使得相似的特征变量归为一类,从而达到识别Ⅱ类特征变量的目的。4)马田系统多类判别研究经典马田系统中,由于基准空间是由一类正常总体所定义,因而对于判别待检样品正常与否的两类判别情形具有良好的效果。然而,对于多类判别的情形,经典的马田系统方法则不能很好的解决此类问题。本论文依据分类两种类型(有序分割类型与一般类型),分别采用3σ准则与扰动模糊的分析方法,对多类判别进行了研究,并阐明了进行多类判别的原理与计算过程。5)马田系统在医疗诊断中的应用研究马田系统是基于数据分析的方法而不是基于变量概率分布的方法,它具有良好的应用价值。本论文选取一类典型的、在临床诊断中具有一定难度的疾病——肺病疾病,通过一定数量的健康数据作为训练样品,构造该疾病的基准空间,并通过特征优化方法对基准空间进行优化;通过构造特征变量与疾病类型之间的扰动模糊关系,将待检样品与模糊扰动模糊关系作用,确定样品的疾病类型,达到疾病诊断的目的。
【Abstract】 Dr. Taguchi developed the Mahalanobis-Taguchi system (MTS) which is a burgeoning method of pattern recognition based on the quality engineering. MTS is the first method that used the orthogonal array to select variables. MTS regards the SN ratios along with Mahalanobis distances as the optimization target, and select the useful variables by using 2-level orthogonal array. So far, MTS is widely used, and create tremendous economic and social bennifts. In China, the research of MTS in theory and application is just commenced developing, and it needs a great deal of manpower and resource.The layout of this dissertation is shown as follows: first, the dissertation reviewed the MTS, and gived the latest progress; second, based on the analysis of MTS, by integration of distance and similarity coefficient, the dissertation create a new index for classification which can reflect both the comparability of distance value and the comparability of shapes between samples; third, through entropy value of every variable, the validity of Class I variables can be recognized, and through fuzzy cluster analysis, the validity of Class II variables can be recognized; fourth, according diffirent of classication style, by the method of 3σrule and disturbing fuzzy set, the theory and design model for MTS design in discrimination of multiclass have been studied; finally, as an example, the MTS approach has been applied to the medical diagnosis and a satisfactory result has been obtained, and provide a new teconology and method to disease diagnosis in China.The research and main conclusion of this dissertation is shown as follows:1) The research of distance in MTSIn original MTS, Mahalanobis distance is used as classification index. Selecting Mahalanobis distance is proper because it is consider the influence of relativity, relativity and etc. And in practice, Mahalanobis distance has the better discriminant ability than others.2) The research of extention of classification indexIn original MTS, distance is adopted as classification index. Distance index can reflect how close among samples in space. However, in the other hand, distance index can not reflect the similitude of samples’ shape. So, this dissertation extends the classification index by integrating the distance and similarity coefficient, and thus it can reflect the similitude of samples in both sides. 3) The research of variable selectionOriginal MTS is used the method of orthogonal array and SN ratio to select variables. However, this approach would become more complex along with the increase of variable number. According to ideology of Taguchi’s data analysis, this dissertation applied entropy into variable selection, and gives an example to show efficiency of entropy method.The method of experimental design and entropy could identify the harmful variables (Class I variables), but they could not identify the variables which have similar effect (Class II variables). This dissertation uses the fuzzy cluster method to identify the Class II variables.4) The research of multiple recognitionIn original MTS, because the base space is constructed only by normal data, it is more fittable to classification of two models than multiple models. According to two kinds of classification of samples, this dissertation uses the methods of 3σrule and disturbing fuzzy set to solve the problem of discrimination of multiclass.5) The research of MTS applications in medical diagnosisMTS is the method based on data analysis rather than probability distribution, and is suitable to be appled in practice. This dissertation chooses pulmonary disease as the object. Through amount of normal and abnormal data, we contruct the base space, select the useful variables; by constructing the disturbing fuzzy relationship between variables and disease types, compute the fuzzy number of samples, and so can achieve the purpose of medical diagnosis.
【Key words】 Mahalanobis-taguchi system; data classification; discriminant analysis; disease diagnosis;