节点文献

聚类和主成分回归在经济指标数据中的应用研究

The Application of Clustering and Principal Component Regression in the Economic Indicator Data

【作者】 姜扬

【导师】 周春光; 王喆;

【作者基本信息】 吉林大学 , 软件工程, 2010, 硕士

【摘要】 建国60年来,我国的城市社会经济建设发生了翻天覆地的变化。城市建设日新月异,城市居民生活质量和生活环境得到极大改善。本文的主要数据来源就是2009中国统计年签中的10-3:省会城市和计划单列城市的主要经济指标。该数据主要描述了省会城市和计划单列城市(总共36个)的23个经济指标。本文使用的实验方法是聚类和主成分回归。本文研究的主要结果是:1、聚类:对36个城市分类,分类标准是22个经济指标。本文的处理方式是使用SPSS中的K-均值算法(也称为K-均值算法),将36个城市分成两类和三类,通过分类结果讨论城市之间的经济发展差距。2、对城市人口总数建立回归模型,即在22个经济指标属性变量中找到一些和人口总数相关的经济指标,即将22个经济指标属性变量进行降维,本文是选择了10个经济指标,建立城市人口总数和这10个经济指标直线的多元线性回归模型。通过聚类和主成分回归的操作,熟练掌握社会统计学软件SPSS的操作和应用。同时也对聚类、主成分分析、回归分析有着更深刻的理解。本文研究的重点和难点是主成分回归。主成分分析是指降维,回归分析是指建立自变量和因变量之间的回归模型,主成分回归是将两者结合起来,同时达到降维和回归的目的。

【Abstract】 The Application of Clustering, Principal component Regression analysis in the economic indicator data. Tremendous changes have taken place on China’s urban social economic construction since it set up in 1949.Urban development is becoming more rational layout with rapid progress of urbanization. The economic structure has been further improved. Economic of the urban plays an important role in the national economy. Rapid urban construction, urban quality of life and living conditions greatly improved.The main source of data in the paper is in the file named 10-3:main economic indicators of capital cities and cities with independent plans, it is included in 2009 China Statistical Yearbook. The data mainly described the 23 economic indicators of the capital cities and cities with independent plans (a total of 36), economic indicators from these numerical show the gap between the development of cities, which mainly described in the medical, public health, education, transportation and other aspects.The paper researches on the data of the economic indicators by the SPSS statistical software. SPSS has a complete function of data management and statistical analysis. SPSS has amount of characteristics, such as simple, no programming, powerful and convenient data interface. In addition, it has a flexible combination of function modules. The functions of SPSS include data inputting, editing, statistical analysis, reporting, graphics, production and so on. It has 11 types of 136 functions of its own. SPSS provides both simple statistical description and complex multi-factor statistical analysis methods, such as exploratory data analysis, statistical description, contingency table analysis, two-dimensional correlation, rank correlation, partial correlation, analysis of variance, nonparametric tests, multiple regression, survival analysis, analysis of covariance, discriminant analysis, factor analysis, cluster analysis, nonlinear regression, Logistic regression and so on.The data on economic indicators were operated through SPSS, and the main research in this paper involved the two aspects as follows:1、Clustering analysisIt mainly used the application of clustering analysis to classify the data on the economic indicators of 36 cities, according to 22 attributes. We can arrive at the gap between cities category through classification of the city’s economic indicators of capital cities and cities with independent plans.2、Principal Component RegressionPrincipal Component Regression is the focus of the study in the paper. It combined the principal component analysis with regression analysis together. First, it made principal component analysis of several properties to achieve the purpose of dimension reduction, then it established the regression relationship between target variables and a few independent variables separately.The main purpose of principal component analysis is using fewer variables to explain most of the variation of the original data, and it can change a number of related variables in our hands into a highly independent r or irrelevant variables between each other.It usually chooses several new variables fewer than the original number of variables which can explain most of the information in the variation, called principal components, and it can explain a comprehensive index of information. Principal component analysis is actually a dimension reduction method.The main purpose of regression analysis is to establish regression model. It determined the causal relationship between variables and established the regression model through the provisions of the dependent variable and independent variables, and solved the parameters of the model based on experimental data, and then evaluated whether the regression model fit well the measured data; if it fit well, we can predict the independent variable further. This paper describes the applied research of the main component regression in the economic indicator data. It studied the relationships among the total urban population (Y) and a number of economic indicators by principal component regression.First of all, it should determine collinearity by regression analysis. It established the regression model among general population and the 21 economic indicators, and it got the 10 economic indicators related to the total population by "the back-out method". Because the model revealed the existence of collinearity,10 economic indicators needs principal component analysis.Secondly, principal component analysis will need to check the suitability of extracting principal components. After testing, KMO’s value was 0.8 or above, and gravel figure shows a straight line presented "steep slope" shape, it was suitable for component analysis. As a result, it extracted two principal components from the 10 economic indicators, and the two principal components can reflect more than 80% of the information of the 10 economic indicators, the first two eigenvalues cumulative contribution rate has been achieved to 83.887%. After the calculation of the original load factor, it obtained the expressions among two principal components (F1, F2) and the 10 economic indicators. It obtained the principal component score by multiplying the feature vector and standardized data. In addition, it reached a comprehensive principal component.Finally, the paper established the regression model between a total urban population and 10 economic indicators separately. First, it established the regression model between the city’s total population and the two principal components named Fl and F2 separately through SPSS regression operation. Then it built the regression model between the city’s total population and 10 economic indicators synthetically by the expression of principal component.Through researching on economic indicators data, I understood clustering、principal component analysis and regression analysis better. I learned the ideological principles of principal component regression. Besides, I mastered several operations of SPSS.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2010年 10期
  • 【分类号】F224;F299.2
  • 【被引频次】22
  • 【下载频次】3201
节点文献中: