节点文献

面向个性化主题搜索的用户—查询词语义本体构建

Construction of User-Query Semantic Ontology(UQSO) for Personalized Topic Search Engine

【作者】 冯明丽

【导师】 杜亚军;

【作者基本信息】 西华大学 , 计算机软件与理论, 2010, 硕士

【摘要】 目前,由于用户输入的查询词的简短以及表达语义的模糊性,大多数搜索引擎都面临查询词理解的问题。主题检索系统如何能够准确的理解用户输入的信息需求,同时具有关于检索信息源的语义知识?“不同的用户输入相同查询关键词”和“同一用户输入不同查询关键词”时怎样自动有区分的为每个用户返回准确的相关信息?这是本文研究的主要问题。大多数搜索引擎搜集了大量的用户查询日志,这些数据记录了用户历史查询点击信息,不同程度地反映了用户的兴趣和领域知识。用户记录越多,对用户领域知识的刻画越准确。而本体(Ontology)具有良好的概念层次结构和对逻辑推理的支持,具有通过概念之间的关系来表达语义的能力,能较好的为语义检索和概念检索提供知识基础。形如WordNet这样的词库中拥有大量的反映领域专家知识的同义词、近义词、词与词之间的is_a、part_of关系。因此利用丰富的用户查询日志信息和WordNet词库中的语义关系来为主题检索提供一个本体结构的语义背景,为开发新一代个性化主题信息检系统提供了广阔的天地。研究历史知识库中用户查询词与点击网页间的关系,建立用户查询词之间反映用户个性化知识的语义关系模型显得格外重要。本文的主要研究内容如下:首先,本文提出了一种新颖的个性化查询词语义聚类方法,该方法将用户查询词按用户个性化兴趣和知识背景进行主题分类。搜索引擎用户查询日志包含了丰富的用户历史访问记录,这些记录不同程度的反应了用户兴趣和领域知识。本文首先提出了基于用户查询日志的三种用户查询词语义相似关系,如基于查询词本身的相似关系,基于用户查询点击序列的相似关系和基于用户点击文档内容的相似关系,通过分析这三种语义关系,提出了一种新颖的计算用户查询词语义相似度的方法,基于这种用户查询词语义相似度得到聚类相似函数,利用层次凝聚聚类算法,从而将用户查询词根据用户查询日志中所反映的主题进行语义主题聚类,以基本消除了用户查询词的语义模糊性。其次,本文提出了一种利用用户查询词语义主题聚类结果和WordNet词库中词与词之间的关系建立一个用户查询词兴趣主题领域知识模型,即用户—查询词语义本体(User-Query Semantic Ontology,UQSO)的方法。UQSO具体描述了一个用户兴趣所在领域,形成了个性化主题检索的基础。该本体表达了用户兴趣偏好,将来可以由此产生用户群和用户群偏好,然后将其应用于主题搜索引擎,进而可以把信息采集从基于关键词的相关度匹配技术层面提高到基于语义层面的查找,以便为用户提取出更适合其潜意图的信息,从而实现个性化主题搜索的目的。最后,本文利用Porotégé2000本体构建工具,和C++进行了实验验证,对一个用户的查询词集进行了查询词聚类并借助WordNet词库构建了该用户的用户—查询词语义本体(UQSO)。实验表明,通过本文本体构建方法,用户查询词能更好的根据用户兴趣和知识背景来区分其真实语义,消除其语义模糊性。因此,UQSO为实现个性化主题搜索奠定了基础。

【Abstract】 These years, because of the brevity and semantic ambiguity of user query words, most search engines face a problem to understand the meaning of query words。How topic search engine to not only accurately understand user submiting information needs, but also possess of the relevant semantic knowledge of query information source, and how to automatically and distinguishingly return the accurate relevant information to each user when“different users enter the same query keywords”and“the same user inputs different query keywords”to topic search engine, which is our main research issues. Most search engines gather a large number of user query logs, which record the user history queries and clicks on information, and reflect the user’s interest and domain knowledge to varying degrees. More users record, more accurate to characterize the user’s domain knowledge. Ontology has a good concept structure and support for logical reasoning, owns the ability of expression semantics based on the relationship of concepts, and also can provide basic knowledges for semantic search and concept search. WordNet contains a large mumber of queries relations , such as“synonym”,“synonyms”,“isa”and“part of”, which can reflect expert’s knowledges. Therefore, to take use of rich user query logs and semantic relations in WordNet to construct ontology as semantic backgroud of topic search engine, it provide a vast world for developing a new generation of Personalized Topic information retrieval system。Studing the relations of user query words and web clicks in history knowledge records, and constructing the model of semantic relations which reflects the user personalized knowledge between user query words, has become particularly important.The main contents of this paper are summarized as follows:First, we present a new method of personalized user-query semantic clustering to classify user query words into subjects by user’s personal interests and background knowledge. User query logs contain a wealth of user-access history records, these records reflect user interests and domain knowledge to some extend. Above all, we propose three semantic relations based on user query logs, such as based on the query word itself ,based on user query click sequence and based on user query click content. Then, according to the analysis of these three semantic relati ons, we propose a novel computing method of user query semantic similarity. Based on this user query semantic similarity ,we can get the function of cluster similarity, and by hiera- rchical agglomerative clustering algorithm, we can cluster user query terms into semantic subjects based on the reflected topics in user query logs so as to disambiguated the semantic ambiguity of user query words.Secondly, we propose a method to construct user-query semantic ontology (UQSO) which is a model of user query interest domain knowledge in use of user query semantic clustering and queries relations in WordNet. UQSO describes user interest domian knowledge and formes the basis of personalized topic search engine. This ontology express the user interest preferences, and then based on this to establish user group and group preferences which if is applied to search engines, will improve the technical level of information collection from based on similarity matching of keywords to based on semantic query, and which is convenient for users to provide more suitable information, thus achieve the purpose of personalized search.Finally, we use Porotégé2000 ontology construction tools, and VC++ programming language for the experimental verification to cluster a user query word set, and take use of WordNet to build user-query semantic ontology (UQSO). Our experiment shows that, by this ontology construction method, the true meaing of user query words can be better distinguished according to the user interests and background knowledges, and query semantic ambiguity can be eliminated. Therefor UQSO can be a foundation of the realization of personalized topic search.

  • 【网络出版投稿人】 西华大学
  • 【网络出版年期】2011年 05期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络