节点文献

混合值差度量及其在MDS中的应用研究

Research on Heterogeneous Value Difference Metric and Its Applications in MDS

【作者】 杜家杰

【导师】 段会川;

【作者基本信息】 山东师范大学 , 计算机软件与理论, 2012, 硕士

【摘要】 多维尺度分析(Multidimensional Scaling,MDS)是一种传统的多元统计方法,自提出以后的数十年来,随着研究的不断深入,应用范围越来越广泛。目前,学界对MDS的应用研究仍旧处在一种非常活跃的状态。MDS已经被广泛应用于经济学、管理学、心理学、社会学、考古学、生物学、医药、化学、网络分析等众多不同领域之中,并取得了较好的经济效益和社会效益。在这方面,国外的研究走在前列。MDS的处理对象一般是一组对象之间的两两相似性度量,这种相似性度量通常以对象之间的距离为标准,选取合适的距离计算方法在比较大的程度上影响着MDS的处理效果,当对象采用混合类型的属性进行描述时更是如此。目前,MDS通常以欧氏距离(Euclidean Distance)为基础。然而由于欧氏距离具有与各指标的量纲有关、不考虑各指标间的相关性等特点,MDS的处理效果将会受到一定影响。尤其是,欧氏距离对名义属性并不是一种直接的处理方式。在处理名义属性时,欧氏度量方法通常先将名义属性值用数值进行代替,然后以数值型属性的处理方式进行处理,这就从根本上否定了名义属性的固有特点,从而造成信息丢失。另一方面,MDS分为度量性MDS(Metric MDS)和非度量性MDS(Non-metricMDS),其中前者用于定量处理,后者用于定性处理。由于非度量性MDS对对象间的相异(似)性与对象间的距离关系要求不算严格,只需满足单调的顺序等级关系,不需要定量地表示出来,因此,非度量性MDS对定序数据是比较有效的。而对于名义数据(Nominal Data),度量性MDS未必有效。考虑到度量性MDS是进行的定量分析,要比进行定性分析的非度量性MDS更能精确地揭示数据的内在结构,因此对一些内容完整、含名义类型数据的数据集,我们可以考虑在优化名义数据预处理效果的基础上,采用度量性MDS进行计算。因此,考虑到欧氏距离的局限性以及MDS本身的特点,我们在根据实际问题进行修改的基础上,采用了混合值差度量(Heterogeneous Value DifferenceMetric,HVDM)来进行数据的预处理,以提高MDS对名义数据计算精确度。在UCI的Abalone数据集上进行的实验表明,这种方法有比传统的数量化方法在重构能力、重构精确度方面都有更好的表现。现实世界中,对象的特点需要从多方面进行描述,所以含名义数据的混合类型属性对象距离的计算较为常见。因此,我们的工作将对此提供一定的支持。

【Abstract】 Multidimensional Scaling is a traditional multivariate statistical method, andwith the deepening of the study, the range of its application has been becoming moreand more extensive since being proposed several decades before. At present, theacademic applied research on MDS is still very active. MDS has been widely used inmany different areas, such as economics, management science, psychology, sociology,archeology, biology, medicine, chemistry, network analysis, and good economic andsocial benefits have been achieved. In this regard, foreign researchers are in theforefront.MDS is run on (dis)similarity matrix, which is obtained by the calculation of thedistance between different objects on the nondimensionalized data. The method forcalculating the distance has a great impact on the output of MDS, especially when theobjects are described by mixed attributes. In general, MDS uses Euclidean distance tomeasure the (dis)similarity of objects. But due to some characteristics of Euclideandistance, such as its relationship with the dimension of attributes, and ignorance thecorrelation of different attributes, the output of MDS will be affected to some extent.In particular, if objects have nominal attributes, such as sex or color, common practiceis digitizing first and then applying Euclidean distance. Obviously, this approach isnot reasonable, for it basically negates the inherent characteristics of the nominalattributes, resulting in loss of information.On the other hand, the MDS has two types, metric MDS for quantitativeprocessing and non-metric MDS for qualitative. Metric MDS creates a configurationof points whose inter-point distances approximate the given dissimilarities. Instead oftrying to approximate the dissimilarities themselves, non-metric MDS approximates anonlinear, but monotonic, transformation of them. So the non-metric MDS worksbetter on ordinal data, but doesn’t necessarily on nominal data. Taking the fact intoconsideration that metric MDS is quantitative, which can reveal the internal structureof data more accurately than non-metric MDS, we prefer to adopt metric MDS on a complete data set that contains nominal data, on the hypothesis that the nominal datacan be preprocessed appropriately.Therefore, considering the limitations of the Euclidean distance and thecharacteristics of the MDS itself, we apply Heterogeneous Value Difference Metric(HVDM), a distance metric computing distance for nominal attributes differently fromEuclidean distance, to MDS to improve its reasonableness on nominal attributes.Experimental results on UCI Abalone dataset shows that the proposed method givespromising results on both reconstruction ability and accuracy.In the real world, the characteristics of the object needs to be described fromdifferent aspects, so the distance calculation of the mixed attributes object containingnominal data is more common. Therefore, our work will provide some support forthis.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络