节点文献

基于链接的网络数据分类和链接预测新方法研究

Research on Classification and Link Prediction of Network Data Based on Links

【作者】 李丽娜

【导师】 欧阳继红;

【作者基本信息】 吉林大学 , 计算机应用技术, 2012, 博士

【摘要】 随着以因特网为代表的信息技术的迅猛发展,人类社会大步迈入了网络时代。关于网络分析的需求日益增加,网络数据挖掘已成为数据挖掘中的一个重要研究课题。网络数据挖掘旨在从网络数据源中提取隐含的知识,完成实体分类、链接预测、社区发现、实体排序和网络聚类等任务,从而达到分析网络的性质、功能、动态变化和网络之间关系的目的。本文围绕网络数据挖掘领域,针对实体分类和链接预测任务,展开了深入的研究。在对现有的实体分类方法进行详细分析的基础上,重点研究了链接关系在实体分类问题中的作用,并针对不同类型的网络数据提出多种解决方案:针对实体有属性和标签的网络数据,提出基于主成份分析和极限学习机的正则化分类模型;针对实体有属性和标签,但已标记实体数量较少的网络数据,提出结合特征选择和链接过滤的主动协作分类方法;针对实体仅有标签信息的网络数据,提出集成网络拓扑特征和标签分布信息的协作分类框架。此外,本文对链接预测的研究现状进行总结,并针对稀疏链接网络,提出协作链接预测框架,给出两种具体实现算法。在多个公用数据集上的实验表明,本文方法能获得较好的效果。

【Abstract】 In recent years, with the rapid development of information technology represented byInternet, human society has entered a network age. The demand for network analysis has keptrising, and network data mining has become a new important research field in data mining,and has been widely applied in numerous domain including document classification, proteinstructure prediction, natural language processing, social network analysis and so on. Networkdata mining aims at extract implicit knowledge from network data source, and performlearning task such as entity classification, link prediction, community discovery, entityranking, and network clustering task, so as to reach the purpose of analysis the nature,function and dynamic change of network, as well as understanding the relationships betweennetworks. As key parts of network data mining, Entity classification and link prediction hasattracted particular attention by researchers and a great deal of work has been done. However,the accuracy of algorithm still needs to be enhanced. Besides, there is little work on sparselabeled and sparse linked network. The thesis selects these problems as its main topic.The thesis analyses current entity classification methods as well as the premisehypothesis, suitable network data type, and applied domains of these algorithms in detail onthe first. And then especially studies the role of links on entity classification task, a series ofsolution has been presented to solve entity classification on different types of network data. Inparticular, focus on network whose entities have attribute information, presents newregularization classification model based on principle component analysis and extremelearning machine; focus on network whose entities have attribute information, but few entitiesare labeled in the network, presents a new active collective classification method by combingfeature selection and link filter; focus on network whose entity have only label information,presents an integrate collective classification framework to deal sparse labeled network.Besides, the thesis summarizes the study situation on link prediction, and presents collectivelink prediction framework to deal sparse linked network.The detail research results are as follows:1. Make a thorough review of research on entity classification and link predictionThe thesis introduces and summarizes the research tasks of entity classification and linkprediction, and points out problems in current approaches and future research directions.2. Focus on network whose entities have attribute information, presents newregularization classification model based on principle component analysis andextreme learning machine. The algorithm improves current regularization method in that it not only contains thesmooth constraints of defined function, but also considers the label distribution in thenetwork. It adds two new regularization items, they are respectively intra-class similarregularization item and inter-class different regularization item. In realization, we extendextreme learning machine so that it can be used for semi-supervised problem, and furtherinduce the weight definition of hidden layer, in order that it fits the new function. Experimentresults show that for the case that the ratio of labeled nodes is more than25%, our methodperformed well.3. Focus on network whose entities have attribute information, but few entities arelabeled in the network, presents a new active collective classification method bycombing feature selection and link filter.To improve the classification accuracies of collective classification methods, we advancethem so that attribute information and link information can be combined the performclassification during the collective inference procedure. This algorithm first uses featureselection to find important features and then constructs links according to attribute similarity;then it analyses original links in network, and selects useful links; finally algorithm combinestwo kinds of links to collective classify nodes. Experiments show that our method can handlesparse problem very well.4. Focus on network whose entities have only label information, presents a collectiveclassification framework to deal sparse labeled network.The framework divides the attributes of node into two categories, that is structureattribute and label attribute. Algorithm uses different attributes in different stage, andintegrates them to perform classification together. Based on this framework, we present a newclassifier which is called Laplacian classifier based on the structure attributes of nodes, andalso present a new classifier based on label distribution, which is named link pattern classifier.We test our approach in comparison with typical collective classification methods, and theresults indicate that our method can perform well than other methods.5. Presents collective link prediction framework to deal sparse linked networkWe proposed a collective link prediction framework, which aims at predicting relatedlinks simultaneously, so that it can deal with sparse linked network as well as network whoselinks are dependent with each other. Based on this framework, two new link predictionmethods are presented; they are separately collective resource allocation and collectiverandom walk. We test our methods on several networks, and results indicate that our methodscan obtain higher prediction accuracy, especially for sparse linked case.Nowadays, network mining has been interested by many researchers. This thesis studiesentity classification and link prediction problem in network mining, and presents effectivelearning algorithms for different data types. It is of both theoretical and practical significanceof the research on classification and link prediction problem in network data.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2012年 10期
  • 【分类号】TP393.09;TP311.13
  • 【被引频次】1
  • 【下载频次】653
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络