节点文献

文本挖掘及其在UDDI Registry智能检索中的应用

【作者】 谭德坤

【导师】 丁志强;

【作者基本信息】 昆明理工大学 , 计算机应用技术, 2004, 硕士

【摘要】 随着Web Services技术的不断成熟和发展,存储在UDDI Registry中的Web Service信息将会变得越来越庞大,如何从UDDI Registry浩如烟海的信息资源中为用户快速、方便、准确地检索出满足需求的Web Service,将变得十分重要。而传统的基于关键词匹配的检索技术已不能满足用户准确而全面定位信息的要求,因此,本文就以Web Service的文本描述信息为研究对象,提出了应用于UDDI Registry的智能信息检索技术。 对文档集进行特征化表示是文本挖掘和信息检索的前提和基础。本文用频繁序列模式挖掘算法挖掘出扩展短语,用扩展短语代表文档的特征项,并用概念秩算法和HITS算法挖掘出文档的主题概念,文档的特征就用主题概念加以表示。 智能检索的核心是概念检索和个性化服务。为了对文档进行概念检索,必须发现某个领域内的概念及其之间的关系,即构建出概念空间。本文通过文本挖掘相关技术挖掘用户访问文档信息,从而构建出用户私有的概念空间,核心算法是改进的K—Means文档聚类算法和FP-树频繁模式发现算法。由于概念空间是通过挖掘用户访问文档信息生成的,它也包含用户的个性化信息,在概念检索时候,也实现了个性化服务的目的。 概念检索是智能检索的具体体现。在概念检索过程中,为了帮助用户更加准确的表达自己的查询意图,本文采用Hopfield神经网络算法对用户的检索关键词集进行概念联想,将联想的结果供用户再次反馈。对用户反馈后的查询表示与文档特征表示,本文给出了概念匹配运算的方法,并讨论了检索结果如何组织的方法。 最后,为验证本文的研究结果,提出了一个将上述几个方面有机结合起来的智能检索系统模型,并给出了一个具体的检索验算。

【Abstract】 With the constant development of Web Services technology, then the Web Service information stored in UDDI Registry will become huger and huger, how to fast, conveniently , accurately search out the Web Service which meet the users’ need from voluminous information resources stored in UDDI Registry will become very important. But the traditional information retrieval method based on keyword matching can’t meet the users’ need any more, therefore, this paper regard the text description information of Web Service as the research object, presents a intelligent information retrieval technology applying to UDDI Registry.The document characteristic representation is the prerequisite and foundation of information retrieval and text mining. This paper uses the frequent sequences algorithm to discover the expanding phrase, the document characteristic then represented by it, and it uses concept rank algorithm and HITS algorithm to extract the theme concepts from document collections. Then the document characteristic representation is represented by these theme concepts.The core of the intelligent retrieval technology is concept retrieval and personalized service. To realize concept retrieval on documents, it need to discover those concepts and the relations among them in related fields of these documents, namely building the concept space. This paper uses relative methods of text mining to build the user’s private concept space through mining the user’s access pattern, the kernel algorithms are improved K-Means clustering algorithm and Frequent-Pattern growth algorithm. Because the concept space is generated by mining the user’s access pattern.it also includes the user’s individualized information, when we retrieve documents based on concept retrieval ,the system has realized the purpose of the personalized service too.The concept retrieval is the concrete embodiment of intelligent information retrieval. In the process of concept retrieval, in order to help user express his query intention accurately, this paper uses the Hopfield neural network algorithm to search the association keywords which are related to the keywords that user input, the associated result is returned to user to select again. For the user’s query expression which is the user’s feedback and document characteristic representation, this papergives a calculational method based on concept matching for them, and discusses the method how to organize the retrieval result.Finally, in order to verify the studying result of this paper, we design a model of intelligent information retrieval system which is the comprehensive application of above-mentioned several respects and give a concrete computation sample.

  • 【分类号】TP391.3
  • 【下载频次】150
节点文献中: