节点文献

面向Web3.0的大众分类研究

Research on Folksonomy Oriented to Web3.0

【作者】 熊回香

【导师】 王学东;

【作者基本信息】 华中师范大学 , 情报学, 2011, 博士

【摘要】 随着社会化软件的发展,越来越多的Web2.0网站以应用的开放性、技术的渗透性和信息传播的交互性等特性及其读写并存的表达方式、社会化的联合方式和便捷化的体验方式等优势,影响和改变着人们的工作和学习方式;同时,Web2.0自身具有的开放性、去中心化、聚合性、高度交互性和创新性等特性,也使得越来越多的用户参与到网络信息的创造和发布中。然而,一方面,伴随着信息源和信息量的激增,衍生出了信息杂乱无章、信息纯净度和可信度降低、搜索引擎精准度下降等问题;另一方面,用户迫切希望通过便捷的互动交流和协同共享方式来及时准确地获取符合自身需求的信息和知识。面对这些问题,注重信息筛选与个性化信息聚合,以“个性、精准和智能”为核心理念的Web3.0模式应运而生;这一新型模式为问题的解决带来了新的希望。Web3.0是以Web2.0为基础,因此,大众分类(Folksonomy), Web2.0环境下产生的新型信息分类法仍然是Web3.0环境下的主要信息分类方法之一。但是,大众分类在给网络用户提供方便、自由的标签标注和检索的同时,也存在着诸如标签的多样性、模糊性、扁平化结构和语义关系缺乏等缺陷;这些缺陷制约着Web3.0“个性、精准和智能”这一核心理念的真正实现,因此,从Web3.0对大众分类法的需求出发,对大众分类体系的优化展开相关研究,是非常有必要并且具有重要的意义。基于此,本文综合运用社会学、语言学、数学统计、计算机科学等多学科的理论,使用实证分析、数学统计学、社会网络分析、数据挖掘等方法,充分利用中文语料资源对大众分类的优化展开研究。全文共分为8章,每章的具体内容如下:第1章,对本论文的选题背景、研究现状和研究意义进行了全面的阐述,提出了研究目的和研究内容,并对研究方法和研究思路进行了介绍,总结了本研究的主要创新点。第2章,主要是针对研究所涉及到的相关理论进行简要述评。首先,对大众分类的定义与内涵、大众分类的运行机制、大众分类的类型、大众分类的基本特征等进行阐述;接着对语义网的基本思想、体系结构、本体论等内容进行了归纳性的介绍:然后,对Web3.0的产生、Web3.0的内涵和特征、Web3.0的技术支撑以及现状等进行分析总结;最后剖析了大众分类、语义网及Web3.0三者之间的关系,为后续的研究提供了思路。第3章,首先分析了标签的内涵及特点,接着通过典型中文Web2.0网站展开标签的实证研究,主要分析了标签的语言特征、标签的分布规律、标签与用户和资源的关系、标签的质量及规范性、标签的分类体系及标签的推荐等,从而明晰了大众分类体系的运行机制和不足,为后续研究思路的形成提供依据。第4章,首先比较分析了大众分类与情报检索的受控语言之间的异同;接着阐述了标签库的构建方法,与此同时,把中文语义词典《同义词词林》引入到标签库的构建中,通过词语相似度比较来规范标签库中的语词;然后从标签的推荐、用户管理机制、以及垃圾标签处理几个方面探讨了用户标注的控制;最后分析了用户标签优选的机制。本章研究的目的是提高标签的质量,为挖掘标签间的语义奠定基础。第5章,首先详细地剖析了Tag资源自动分类的过程,构建了Tag资源自动分类的算法模型,在Tag资源的自动分类模型中,借鉴了文本自动分类的思想,用标签的使用频率表示Tag资源的向量空间,并在Tag资源的向量表示中,引入《同义词词林》进行向量的语义表示,从而提高自动分类的精度;接着分析如何利用标签库构建标签的层级,并通过内容管理系统Drupal介绍了标签层级化的实现方法。本章的研究主要是结合传统分类法的思想,构建标签的层级,从而为用户提供标签和导航,进而提高标签的检索效率。第6章,首先分析了标签的聚类及其相关算法;接着分别从标签的共现分析、标签的向量表示、关联规则挖掘三个方面探讨了标签或Tag资源的自动聚类,在聚类算法的选择上,综合考虑不同聚类算法的优点,并基于不同的数据模型选择了不同的聚类算法,同时在标签样本数据集进行了理论上的验证,证明了这几种算法的可行性。这部分的研究为标签与本体的结合奠定了基础。第7章,首先探讨和分析了标签概念空间的构建,并分别运用不同的算法构建了标签的层次空间和网状空间,同时用实例证明了其可用性和有效性;接着详细地分析了标签与本体的映射机制和方法,并在进行概念的匹配过程中,引入了另一个中文语义词典《知网》进行概念间的相似度比较,增加了匹配度的可靠性;最后从标签的语义控制入手,提出了标签本体模型的概念,并对目前典型的标签本体模型进行详细地比较,同时以SIOC本体模型为例,介绍了标签的语义控制过程。本章的内容主要是围绕标签语义关系的提取展开相关研究,其研究成果有助于语义网(Web3.0)的最终实现。第8章,对论文进行了总结,总结了论文的主要研究内容,得出的主要结论,主要创新之处,并剖析了研究中存在的不足,基于此展望了未来的研究重点和方向。

【Abstract】 With the development of social software, characterized by openness in applications, penetration of technology and interaction of information dissemination, more Web2.0 websites influence and change the way people work and study. Meanwhile, attributes of Web2.0, such as openness, user participation, rich user experience, and decentralization, make more people create information and issue them on the Internet. However, accompanied by proliferation of information sources and information, problems occur, for example, information is chaotic and its purity and credibility decreases, accuracy of search engine declines; and on the other hand, users hope to timely and accurately acquire what they need through interactive, collaborative and convenient ways. Faced with these problems, focusing on information filtering and personalized aggregation, Web3.0 emerges, whose core idea is personality, precision and intelligence. And its emergence brings new hope for solutions to these problems.As Web3.0 is on the basis of Web2.0, Folksonomy, developed under Web2.0 environment, is still one of the main methods for information classification. Though it could provide users with convenient, free tagging label and information retrieval; defects also exist, such as diversity, fuzziness and flat structure of tags, lack of semantic relations. These defects restrict realization of Web3.0’s core idea, therefore, it is essential and of great importance to research on the optimization of Folksonomy considering new requirements of Web3.0.Based on this, comprehensively integrating multi-disciplinary theories, such as sociology, linguistics, mathematics and computer science, and with methods of empirical analysis, statistics, social network analysis and data mining, this paper fully uses Chinese corpus to study the optimization of Folksonomy. The main content includes eight chapters:In chapter 1, this paper illustrates the background of selecting the topic, the research status and the research significance, and then puts forward the research purpose and content, and introduces the research methods and ideas, giving the main innovation points of the research.In chapter 2, it is a brief review of correlation theory involved in the research. Firstly, it illustrates the definition and connotation, the operating mechanism, the types and the basic features of folksonomy; Then, it gives an inductive introduction of the basic ideology, architecture of semantic web and ontology; And then, it analyses and summarizes the emergence, connotation, features, supporting technology and status of Web3.0; Lastly, it analyses the relationship of folksonomy, semantic web and Web3.0, providing an orientation for the following study.In chapter 3, this paper firstly analyses the connotation and the features of tags. Through the typical Chinese web 2.0 website, it carries out the empirical research of tags, and mainly focus on the language features of tags, the distribution rule of tags, the relationship of tags, users and resource, the quality and criterion of tags, the taxonomy of tags and the recommendation of tags. Consequently, it finds out the operating mechanism and the deficiencies of folksonomy, providing basis for the formation of train of thought in the following research.In chapter 4, this paper firstly compares and analyses the similarities and differences between folksonomy and controlled language of information retrieval. Then it illustrates the construction method of the tag library, meanwhile, it introduces the Chinese semantic dictionary Tongyici Cilin into the construction method of the tag library, standardizing the terms in the tag library by comparing similarities between terms. And then, it discusses the control of user annotation from several aspects such as the recommendation of tags, the user management mechanism and the processing of spam tag. At last, it analyses the optimum selecting mechanism of user tags. The research purpose of this chapter is to improve the quality of tags, so as to lay the foundation for mining the semantics between tags.In Chapter 5, firstly, I analyze the automatic categorization process of tag resources in detail and construct the algorithm model of the automatic categorization of Tag resources. In the automatic classification model of tag resources, I borrow the thought of automatic text categorization to use the tag frequency as vector space of resources, and (?)e in the synonym words to express the semantics of vector as to improve the accuracy of the automatic classification; then, I study on how to use the tag Library to construct the tag hierarchy, and through the content management system Drupal, I introduce the realization method of the tag hierarchy; at last, I analyze the thought of facet match. This chapter mainly research on the combination of the traditional classification thoughts and the construction of tag hierarchy to provide tags and navigation for users and improve the retrieval efficiency.In chapter 6, at first, I study on the clustering analysis and related algorithm; at second, I discuss the automatic clustering of tag or tag resources from the aspects of tag co-occurrence analysis, the vector expressions of tag and association rules mining. On the choose of clustering algorithm, I take advantages of different clustering algorithm comprehensively, use different clustering algorithm to different data, and do theoretical verification on data set of tag sample to prove the feasibility of these algorithms. This part of the research lays a solid foundation for the combination of ontology and tag.In chapter 7,I firstly discuss and analyze the construction of the concept space of tags, and apply different algorithms to the construction of the hierarchy space and mesh space of tags respectively. And I testify its availability and validity with examples; then, I analyze mapping mechanism and methods of the tag and ontology, and in the process of conceptual matching, I absorb another Chinese semantic dictionary CNKI to compare the similarity between concepts to increase the reliability of the matching degree; lastly, from the semantic control of tags, I put forward the concept of tag ontology model and compare the current typical tag ontology model closely. At the same time, taking SIOC ontology model as an example, I introduce the semantic control process of tags. The main part of this chapter is about the extraction of semantic relationship of tags, and the results increase the process of the final implementation of semantic web (Web3.0).Chapter 8 summarizes the main research contents, conclusions and innovations of the paper, analyzes the insufficiency during the research, and finally, looks forward to the future research focus and direction.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络