

【作者】 盛英

【导师】 张继福; 杨海峰;

【作者基本信息】 太原科技大学 , 计算机软件与理论, 2012, 硕士

【摘要】 数据降维是指在给定的一个样本空间,依据特定的法则,找到高维数据的低维表示,并且能够保持原始数据的潜藏的内在信息。数据降维技术主要解决维数灾难问题,数据降维依据有无标号信息和成对约束条件等分为三个不同的类型,分别是有监督数据降维、无监督数据降维和半监督数据降维。本文以传统数据降维算法FDA和PCA为研究对象,建立了半监督数据降维的框架,实现了光谱数据的半监督降维分析同时对半监督降维系数的选择、光谱数据标号信息的选择进行了研究,其主要研究成果如下:一、建立了基于FDA和PCA的半监督降维框架。该框架通过分析比较Fisher判别分析和PCA降维的算法,指出它们在数据特征提取中的不足,既Fisher判别分析是有监督的降维算法,降维结果过分拟合于标号数据,PCA是无监督的降维算法,不能有效的利用样本数据中的标号信息,并实验验证了该框架的有效性。二、给出了一种基于Fisher判别的天体光谱数据半监督特征降维。该方法首先针对天体光谱数据,建立Fisher判别分析和PCA可变动选择的不确定关系;其次构建其半监督降维的全局最优化形式,通过特征值分解计算降维结果,从而有效地克服了天体光谱降维过程中的过分拟合问题;最后采用高红移类星体和晚型星SDSS天体光谱特征线数据集,实验验证了该方法的有效性。

【Abstract】 Data dimension reduction can find a low dimension expression of highdimensional data based on specific laws in a given sample space,and it cankeep inner information of the original data. Data dimension reduction mainlysolve disaster problem. Data dimension reduction is divided into three differenttypes according to category information,it is supervised data dimensionreduction,unsupervised data dimension reduction and semi-supervised datadimension reduction.The semi-supervised data dimension reduction is builtbased on traditional algorithm of FDA and PCA.The paper realizesemi-supervision dimension reduction in the spectral data,with semi-supervisiondimension reduction as the foundation. At the same time, the paper study choiceof semi-supervised dimension reduction, choice of label information.Its mainresearch results are introduced in the next:(1) The semi-supervised framework is built about FDA and PCA. Thefisher discriminant analysis and PCA are compared in the framework, andshortage in data feature extraction is pointed out. The fisher discriminantanalysis overfit the label data in the dimension reduction.The PCA can noteffectively use label information.The experiment check the result of analysis.(2)The semi-supervised dimension reduction of spectral characteristicbased on fisher discriminant analysis is presented. Firstly, for celestial spectraldata, an uncertainty relation is established in which fisher discriminant analysisand PCA can be selected variably. Secondly, the global optimization ofsemi-supervised dimensionality reduction is built. Dimensionality reductionresults are calculated through the eigenvalue decomposition,So that the problemof over-fitting is solved in astronomical spectral data dimensionality reduction.In the end, The method is validated validity in the experiment by using thehzqso and mstar astronomical spectral features line data sets.


