节点文献

信息增值中的聚类分析算法研究

Research of Clustering Algorithm in Information Value-added

【作者】 赵辉

【导师】 顾军华;

【作者基本信息】 河北工业大学 , 计算机应用技术, 2003, 硕士

【摘要】 随着信息获取技术的提高,生产生活的各个领域都存储了海量信息。而信息对人类社会和经济发展的巨大作用集中体现在信息的增值作用上。目前被广泛应用的信息增值方法是数据挖掘技术。本文通过对数据挖掘技术、知识表达方法的分析,提出聚类分析是动态信息增值的最有效的方法。 目前的聚类分析算法普遍存在着初值敏感的缺点,本文以小样本理论为基础,提出了从小样本集中得到初值的算法,在降低了对初值的敏感性的同时,提高了聚类的效果。 针对动态信息增值问题,本文分析了现有聚类分析算法的缺点,并论述了聚类分析算法与知识进化类算法—蚂蚁算法结合的可能性和必要性,最终提出了通过使用蚂蚁算法建立度限制树作为信息数据的分布的思想,大大提高了传统的聚类分析方法进行动态信息增值的效率。并在对实验数据分析的基础上,对建立度限制树的方法加以改进,使得聚类效果有了进一步的提高。 最后,本文阐述了高考成绩的增值意义,并将聚类分析方法应用于高考成绩的信息增值系统中,得到了一系列有意义的结论。

【Abstract】 With the development of the technology of getting the information, there is more and more information in all fields. After the information’s value is added, it can play a more important role in our society. So it is important to study on how to make the information more valuable. Nowadays, Data Mining is the broadest way to do this. In this paper, we raise the point that clustering algorithm is the most efficient way to make the information’s value add.However, the existing clustering algorithm has two shortcomings: The first one is that the clustering algorithm depends on the initial given input so much. In the paper, we propose an algorithm of finding the initial input based on the sub-sample theory. According to the test data, the novel algorithm not only decreases the sensitivity but also generates better quality clusters. The second shortcoming is that the clustering algorithm can not solve the dynamic data very well. In this paper we develop another new algorithm to improve it. According to the ant system, the distributing of the original data is gathered into a degree constraint tree. Then the clusters can be generated according to the feature of the tree. We analyze the result of the algorithm on the test data. It shows that the novel algorithm is an efficient way to cluster the dynamic data.In the last part of this paper, we expatiate how important of adding the value of the scores of entrance-test-to-college. And then after the proposed algorithm works on the scores, we give some useful conclusions of the scores.

  • 【分类号】TP311
  • 【被引频次】1
  • 【下载频次】184
节点文献中: 

本文链接的文献网络图示:

本文的引文网络