节点文献

多关系社会网络分析和可视化系统的研究

The Research of Multi-Relation Social Network Visual Analytic System

【作者】 索利军

【导师】 吴斌;

【作者基本信息】 北京邮电大学 , 计算机科学与技术, 2010, 硕士

【摘要】 传统的数据挖掘技术(包括分类,聚类,关联分析等)专注分析维表的属性,却忽略了记录之间所存在的关系。另一方面,现在主要的网络分析方法主要关注网络的拓扑结构分析而没有注意到网络中节点本身所具有的属性。本文提出的多关系社会网络旨在通过构建异构的网络模型来最大限度的保留原始数据的各种信息,并对多关系网络进行进一步的研究。本文主要对多关系社会网络做以下几方面的探讨:(1)多关系网络建模和网络提取。在对现实数据进行多关系网络建模之后,定义单一网络的抽取操作,从多关系网络中抽取特定意义的单一关系网络。(2)多关系社会网络的实体解析。从多个数据源中收集到的数据,只有经过集成和预处理才能被精确的知识发现模型所使用。而在多个数据源的数据进行集成合并到同一个数据集合当中时,会产生很多的重复记录。而这些数据并不是语义上唯一的,通常表示的是同一个实体。正确的合并这些重复的数据是制造高质量数据的至为重要的一部。这个过程被称之为实体解析(entity resolution),本文尝试在使用属性匹配的基础上,通过使用多关系社会网络多关系的特点,提升实体解析的准确率。(3)社团划分一直是研究复杂网络的一个重要手段,而目前的社团划分算法主要是使用网络拓扑的信息进行划分。本文的另一个研究点是研究在网络节点有属性的情况下,对网络进行社团划分。在使用网络拓扑的基础上,通过使用节点属性,进一步提高社团划分的准确率。(4)可视化,即通过提供统计或交互式视觉表现的软件系统来帮助人们探索和解释数据,是数据挖掘过程中极为重要的一个环节。本文也对多关系社会网络的可视化进行了研究,针对不同的网络类型设计不同的网络视图方案,并提出“网络浏览”的概念,将“网络浏览”应用到一个大规模网络浏览的框架下。(5)本文将上述的研究应用于国家科技支撑计划项目《科技文献信息服务系统关键技术研究及应用示范》,开发了一个科技信息可视分析系统(LiterMiner),通过工具证明了上述研究的可行性。

【Abstract】 Traditional data mining technologies, including classification, clustering, association rules, etc, focus on analysis of the properties of dimension tables, but ignore the relationship that exists between the records. On the other hand, now the main method of network analysis focuses on the network topology analysis, which did not notice that the node in the networks has the attribute. In this paper, we use multi-relation social network (MRSN) to model the the raw data and do some research on MRSN.In this paper, we do some research on MRSN as following:(1) Multi-relation social network modeling and network extraction. We propose the process of modeling the multi-relation social network from the raw data, and then define the operators of extracting homogeneous networks from a multi-relation social network.(2) Entity resolution in MRSN. Data from relevant sources must be collected, integrated, scrubbed and pre-processed in a variety of ways before accurate models can be mined from it. When data from multiple databases is merged into a single database, many duplicate records often result in. These are records that, while not syntactically identical, represent the same real-world entity. Correctly merging these records and the information is an essential step in producing data of sufficient quality for mining. In this paper, we propose a method which combines link analysis on the basis of the attribute-match method.(3) Community detection is an important method to analyze complex networks. The current community detection algorithms merely use the topology structure of the network, but neglect the content of the node. In this paper, we propose an algorithm called CDNA which use not only the topology information but also the content of node to find the communities in the network. (4) Visualization, which provides interative software systems to help analyst explore and understand the data, is an important step of the data mining process. This article also researches the visualization of multi-relational social network. We design different views for different type of networks. And we put forward the "Web browser" concept, and use it to a construct a large-scale Web browsing framework.(5) Finally, the above research result are applied to develop a literature visual analytic system called LiterMiner, which is supported by a project called "Sicence and Techonolgy Information Service System key technology research and application demonstration," under national science and technology fund.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络