

The Research of Clustering Algorithm and Its Application in Teaching Quality Evaluation System

【作者】 李新良

【导师】 陈湘涛;

【作者基本信息】 湖南大学 , 计算机应用, 2008, 硕士

【摘要】 面对大规模的高维数据,如何建立有效的、可扩展的聚类算法是数据挖掘领域的一个研究热点。围绕这个问题,本文在以下几个方面对聚类算法进行了深入研究。分析现有三种聚类初始化方法的优缺点,提出一种新的基于距离的初始化方法。该方法不需设定门限,不受数据集顺序的影响,而且对孤立点和噪声数据有较强的抑制作用,适用于较大规模数据的聚类初始化。针对基于网格、密度聚类方法的缺陷,提出一种基于网格、密度及距离的综合聚类方法。该方法能识别任意形状、大小、不同密度的类,能有效过滤噪声数据,参数设置简单,且无需预先给定聚类个数,具有近似线性的时间复杂度,适合处理大规模数据的聚类问题。研究现有层次聚类方法的缺点,提出一种新的层次聚类方法。该方法采用划分方法将数据分成原子簇,以这些原子簇为基础,实行自底向上的层次聚类,得到最终的聚类结果。该方法对输入参数不敏感,能有效过滤噪声数据,具有执行效率高的优点。针对高校教学评价系统中存在的问题,在研究聚类算法的基础上,开展了基于数据挖掘教学测评系统的原型研究,引入了科学决策技术中的层次分析法,解决了指标建立和权值分配的问题;在研究基于关联规则算法的基础上,对学院测评数据进行聚类分析研究,有效地减少了分析数据量,克服了按得分分类的不合理性。

【Abstract】 Facing the massive volume and high dimensional data, it is one of research points of data mining on how to build effective and expandable clustering algorithm for data mining.Aiming at above issues, the author substantially studied clustering algorithms as follows:Analyse the advantages and disadvantages of the present initializations and propose a method based on the distance optimization,which does not need a threshold, is not affected from the order of data set, is insensitive to outliers or noise, and is available to the clustering of a very large data set.According to the disadvantages of the clustering algorithm based density and gird, a clustering algorithm (CUBN) is presented, which integrates density-based, gird-based and distance-based clustering methods.The method can identify clusters having non-spherical shapes, size and different denisty and can effectively filter noise data, with simple parameters, It has near linear-time complexity and is available to the clustering of a very large data set.According to the existing shortcomings of hierarchy clustering,a clustering algorithm(CMM) is presented, which used the division and classified the data clusters of atom and clustered hierarachy and finally came to the results.It is more robust to outliers, and wide variances in size.with the implementation of the advantages of high efficiency.According to the existing problems in the teaching evaluation system of university, on the study of clustering algorithm, the article conducted the prototype study based on the data mining teaching evaluation system,introduced Analytic Hierarehy Proeess of the scientific of decision-making and resolved the index of value creation and distribution;on the study of the association rulers based on the algorithm, the article took the study of clustering analysis to college evaluation data, reduced effectively the volumn of data analysis and overcame the unreasonability of classification.

  • 【网络出版投稿人】 湖南大学
  • 【网络出版年期】2008年 12期
  • 【分类号】TP311.13
  • 【被引频次】11
  • 【下载频次】259

