节点文献

面向图的群体多特征提取与修正技术研究

The Research of Community Feature Extraction and Feature Prediction in Complex Network

【作者】 饶君

【导师】 徐六通;

【作者基本信息】 北京邮电大学 , 计算机科学与技术, 2013, 硕士

【摘要】 近年来,复杂网络理论的发展为人类了解各种类型的真实网络提供了理论模型和研究方法。电信行业每天都产生海量的电信数据,电信通信数据已经成为复杂网络研究的主要载体之一。了解网络的群体特征有助于人们更深入地认识网络中群体的结构和特点,而特征修正是保障正确群体特征描述的必要步骤。因此,对复杂网络的特征提取和修正是当今一个非常有前景并且具有挑战性的研究领域。与此同时,研究人员面临的另一个挑战是如何在超大规模网络中进行数据挖掘,工业界和学术界已经使用分布式计算模型,如MapReduce和BSP等,取得了一些有效的成果。本文基于大规模电信通信数据,分别从拓扑结构,性别和年龄三个维度深入研究了电信群体的多种特征,并给出了特征提取的并行算法。比较多个关系分类器在电信网络上的效果,利用电信用户的属性信息改进了传统联合推断算法的预测效果,使得准确率大幅提升,并给出了联合推断的并行算法。本文主要工作如下。结合目前研究现状,在介绍了不同类型的群体特征的主要内容和研究成果之后,给出了网络群体划分方法,提出并建立了由模块度、节点度分布、聚集系数、平均最短路径组成的网络群体特征体系。提出了多种群体特征提取方法的并行实现,并针对不同的群体特征采用不同的并行计算模型。提出了以节点为中心的特征修正框架,给出了4种不同的关系分类器和3种不同的联合推断算法。综合分析了各个算法的特点,并给出了适合于并行化的松弛标记联合推断算法的MapReduce并行化版本,用于对大规模电信数据的联合推断。在电信通信数据集上对电信用户的拓扑特征和属性特征进行了分析研究,如邻居、年龄、性别、通话短信次数、通话时长等,从静态和动态两方面对人类通话和短信行为进行了刻画。并分析了电信用户通信的同质性,即用户更倾向于和自已相似的用户产生通信行为,电信运营商可基于此对目标客户进行精准分类与定位,从而进行精准营销。在分析了不同关系分类器在电信数据集上的效果之后,选取了准确率最高的邻居加权关系分类器。不同于传统的联合推断,本文不仅利用电信网络的拓扑信息,还利用了不同性别、年龄用户的通话特征,从而深刻揭示了电信用户交往行为的模式和内在特征。本文将松弛标记联合推断算法和决策树规则相结合,改进后的联合推断算法预测用户性别的准确率为93.17%,预测用户年龄的准确率为90.13%。

【Abstract】 Recent studies on complex network provide theoretical model and research method for researchers to understanding real-word networks. Understanding the community feature is helpful to understand the network topology and group characteristics better, while feature prediction is a necessary step to ensure the correctness of feature extraction. Therefore, feature extraction and prediction in complex network is a challenging and prospective area. At the same time, the continued exponential growth in both the volume and the complexity of information is giving birth to a new challenge to the researchers. With respect to this challenge, multiple parallel computing platforms, such as MapReduce and BSP, has been emerging.Research in this paper are based on the massive telecom data, we present a comprehensive multidimensional study of telecom group feature from topology, gender, age three aspects and provide parallel algorithms for this feature extraction. After compare several relational classifiers, we use the communicaton characteristics of mobile phone users to increase the precison greatly and provide parallel algorithm for feature prediction. The tasks are as follows.Based on current research, after introducing main content and research result of different sorts of group characteristics, the community detection methods are provided, and the system of the network community features is proposed, consist of modularity, distribution of node degree, clustering coefficient and average shortest path. We propose parallel algorithms for all the community features mentioned above, using MapReduce or BSP parallel computing model according to different conditions.In this paper, we present a node-centric Network learning framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. After introduce four relational_classifiers three collective inference algorithms,that relaxation labeling is suitable for parallelization and the MapReduce parallel algorithm is presented.We study the communication behaviors based on the topology of telecom network and attributes of mobile phone users, including gender, age, calling and short message informations to find the hidden behavior patterns of the daily interaction of human beings. We find that people tend to communicate more with each other when they have high similarity. The telecom service provider can target customers and percise marketing based on this analysis.We choose weighted-voted relational neighbor classifier (WVRN), with highest predicton precison, to predict features in telecom network. Besides the topology information, we also use the communication features of mobile phone users in the relational model. We combine the WVRN with a communication decision tree, achieving93.17%precison in gender prediction and90.13%precison in age prediction.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络