节点文献

基于复杂在线网络的社会化搜索

Social Search on Complex Online Networks

【作者】 黄来磊

【导师】 夏正友;

【作者基本信息】 南京航空航天大学 , 计算机应用技术, 2009, 硕士

【摘要】 信息检索领域有两种不同的观点:以计算机为中心和以人为中心。前者将信息检索问题看做如下几个方面组成:建立有效的索引,以高性能处理用户查询,以及开发能够提高结果集合“质量”的排序算法。而以人为中心的观点认为,信息检索问题包括:研究用户的行为,理解用户的主要需求,并判定这样的理解如何影响检索系统的组成和运行。尽管以搜索引擎为代表的信息检索观点存在诸如大量信息无法被检索,查询需求表达困难,返回结果粒度大等问题,但是在信息检索领域,这种观点仍然是最有影响的。Web2.0的浪潮和“PeopleWeb”的崛起,给以人为中心的信息检索范式提供了新的机遇。这个新的机遇来自社会化搜索。Web2.0的核心代表是社会网络服务。本文以中国最大的大学生真实社交网络服务网站——“校内网”为研究对象,目的在于考察可能存在的以人为中心的信息检索范式。本文的主要工作包括:1.收集并制作校内网用户数据集。该数据集包含了南京航空航天大学的34085名注册用户的个人页面信息。具体包含三类信息:一,基本个人资料如姓名,性别,家乡,高中等。二,个人展示信息如头像,相册,日志等。三,社会交互信息包括好友列表,留言交互,礼物交互等。为数据挖掘,行为科学等研究领域提供了便利。2.分析用户行为角色和交互模式。基于校内网的留言板特性,构建留言交互社交网络,数据挖掘算法识别出明显差异的用户类别,包括“出访型”、“互访型”和“入访型”用户。进一步聚类识别出校内网中的“人气之星”,考察用户类别与声望之间的关系。在用户分类的基础上,采用卡方检验证实不同类型的用户之间存在着显著的交互模式。3.分析用户的需求类型和兴趣分布,构建仿真模型进行搜索实验。基于校内网的群组特性和用户数据中的群组列表,发现群组中的用户数量分布符合幂率。进一步将所有用户的兴趣分为“大众需求”,“小众需求”和“个性化需求”,并在此基础上构建仿真模型,进行专长定位搜索。实验表明,弱连带和出访类型搜索策略在“个性化需求”下的有效性。4.基于Java语言开发了SNSAnalyzer社交网站分析系统。整个系统包括网页爬虫模块、多智能体仿真模块,社会网络分析模块和社会网络可视化模块。

【Abstract】 There are two points of view in information retrieval field, one of which is computer centered perspective and the other is focus on human behavior. The former view takes problems in information retrieval as efficiently building document index, dealing with user’s query and designing ranking algorithms that can improve the quality of result set. According to the later perspective, it is studying human behavior, understanding the need of users and judging how they will affect the construction and function of retrival system that frame information retrieval problems. The computer centered view represented by search engine technology is the dominated one although it suffers problems such as the difficulty of indexing the entire web, the hardness of express information need with key words and the size of result set.However, the revolution of“Web2.0”and the rising of“PeopleWeb”give new opportunity to the human centered information retrieval paradigm. The new opportunity is called“social search”. One of the core applications in Web2.0 revolution is social network service. In this thesis, xiaonei.com, the largest social network service focus on campus users in China, is scrutinized in order to study the potential of designing human centered information retrieval system. The main works are as follows:1. Collecting and making a dataset based on xiaonei.com. The dataset is collected from profile pages of 34085 users of Nanjing University of Aeronautics and Astronautics. The information in the dataset include three types: (1)demographic characteristics of users such as name, gender, hometown, highschool; (2)self-representation information such as head picture, album and weblogs; (3) information produced from interpersonal communication such as making friends, leaving messages and exchaning virtual gifts. This dataset can facilitate the research in fields like data mining and human behavior.2. Analyzing behavioral roles and communication patterns of users. A clustering algorithm clearly identify three categories of users including“outgoing”,“reciprocal”and“incoming”types based on social network constructed from message interaction. The“popular star”was further identified, showing the result is correlated with user prestige. The chi-square test for independence demonstrate clearly communication pattern exists among the different types of users.3. Analyzing the distribution of needs and interests of users and constructing a simulation model. The interests of users are studied by analyzing the group feature of xiaonei.com. They are found to obey power-law distribution. Furthermore, theses interests are classified into three categories:“mass need”,“group need”and“individual need”. A simulation model is constructed based on the taxonomy. An expertise location experiment demonstrate the effectiveness of the weak tie query propagate strategy and the outcoming strategy.4. The SNSAnalyzer system is built with java programming languaget. The entire system including four major components: web page crawler, multi-agent simulator, social network analysis component and social network visualizer.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络