节点文献

网络环境下的领域知识挖掘

Domain Knowledge Mining in Network Environments

【作者】 王萍

【导师】 张际平;

【作者基本信息】 华东师范大学 , 教育技术学, 2010, 博士

【摘要】 当前海量异质、快速增长的网络资源带来了“数据过剩”和“知识贫乏”的矛盾,增大了人们及时获取有用知识的难度。本文以网络环境下的异质数据源为研究对象,探寻各种数据中知识发现的可行性,按照“模型提出—算法实现—数据验证”的思路,研究如何有效地利用和挖掘网络数据资源,获取潜在的、有价值的领域知识。1.提出了网络环境下的领域知识挖掘模型。该模型是一个包括数据层、知识层和应用层的三层模型,指导从异构数据源中挖掘多维度知识以提供多种知识应用。基于该模型,论文以网络科技文献、博客日志和社会化标注为研究对象,进行了三种具体的领域知识挖掘实践研究。2.提出了一种新的概率主题模型:Topic-Author模型。该模型对科技文献的文本信息和作者信息进行联合建模,深入对文献的分析。基于此模型构建了一个多维度文献知识挖掘框架,进行概念挖掘、专家发现、文献推荐,研究趋势分析、主题关系挖掘等领域知识的发现和应用。3.提出了一个Blog知识挖掘框架,进行主题挖掘、观点分析和扩散研究。利用文本聚类和主题模型两种文本分析方法,挖掘Blog日志内容中的潜在概念,并对其进行观点分析。研究了社会化网络的扩散模型,总结了实现扩散最大化的方法,提出了一种改进的门槛扩散模型。4.分析了基于社会化标注的集体智慧和Web环境下的知识组织分类法,进行了社会化标注的语义知识挖掘,提出了一种轻量级本体构建方法。该方法依据所提出的基于加权网络分割的社会化标签聚类算法,进行语义聚类和语义分层。研究结果表明,论文所提出的领域知识挖掘方法,能够发现大量有价值的、潜在的多维度知识,为用户提供多种知识应用服务,支持信息时代的知识获取与学习。

【Abstract】 The massive, heterogeneous and fast-growing data resources in current web environments have brought the contradictions between’data rich’and’knowledge poor’, which increases difficulty to acquire potential and valuable domain knowledge.The thesis uses heterogeneous web data sources as object of study to explore the feasibility of knowledge discovery from them. Following the train of thought of ’model proposition, algorithm implementation and data verification’, the thesis conducts the practice study of mining potential and valuable domain knowledge using web data resources.1. It proposes a domain knowledge mining model in the network environments. The model is a three-layer one, which are data layer, knowledge layer and application layer from bottom to top. It guides to mining multidimensional knowledge. Based on the model, the thesis conducts the practice study of domain knowledge mining with three data types of scientific literatures, blog posts and social annotations.2. The thesis proposes a new probabilistic topic model:Topic-Author model, which jointly model information of literature content and authors. Based on the model, a domain knowledge mining framework is proposed to perform the multi-dimension analysis for domain knowledge discovery, including concept discovery, expert finding, articles recommendation, trends analysis, and correlations identification.3. The thesis proposes a blog knowledge mining framework to study topic mining, opinion analysis and information diffusion. Potential concepts are discovered based on text clustering and topic modeling and the opinions of the concepts are analyzed. Based on the study of social network diffusion models, the thesis summarizes the methods for maximization of information diffusion and proposes an improved threshold model.4. The thesis analyzes the collective intelligence in social annotations and different knowledge classification methods in web environment. A lightweight ontology construction method is proposed to discover semantic knowledge of social annotations. The method uses a clustering algorithm of social tags based on weighed network division to perform semantic cluster and semantic layering.The results show that the study of domain knowledge mining proposed in this thesis can discover vastly valuable and potential knowledge, provide multiple knowledge services and support knowledge acquisition and learning.

  • 【分类号】G434
  • 【被引频次】7
  • 【下载频次】1513
节点文献中: 

本文链接的文献网络图示:

本文的引文网络