节点文献

基于神经网络的用户建模和Web信息过滤研究

The Research on Neural_Networks_based User Modeling and Web Information Filtering

【作者】 代学武

【导师】 李建国;

【作者基本信息】 西南师范大学 , 计算机应用技术, 2003, 硕士

【摘要】 Web个性化服务是当前人工智能和信息技术领域研究的热门课题之一。以搜索引擎为主的信息检索技术并没有考虑用户的兴趣、偏好的不同,相同的关键词返回相同的结果。而且返回的结果良莠不齐,用户需在大量噪音中找寻有用信息。用信息过滤技术改进Internet信息检索系统已成为非常重要的研究方向,是个性化服务的基础。 利用用户模型可以更好的进行信息过滤。用户模型(User Model)是对一个类别的用户组或单个用户的描述。基于用户模型,计算机中可以表达、存储、复现用户模糊的、变化的兴趣特征,其中存储的用户信息构成了过滤条件,使得信息过滤更有效。 本文首先分析了目前Internet上个性化文本信息检索的研究和应用情况,以及以模糊逻辑、神经网络为代表的软计算的发展现状。借鉴ANFIS网络,提出了一种将神经模糊网络技术应用到用户建模中,建立个性化自适应用户模型,并应用于Web信息过滤的方法。文中讨论了以下几个关键问题: (1)Web页面和用户兴趣的表达,采用了向量空间模型(VSM),将Web页面映射为向量空间中的内容向量PJ。在过滤之前,由用户输入检索关键词和若干Web页面作为样例,经分词抽取出用户的兴趣向量ui。 (2)信息过滤的目的是将Web页面分为相关Web页面集合R和不相关Web页面集合R。利用术语在R和R中的局部权重的不同,选择权重差别最大的那些术语作为特征术语,以确定和调整向量空间的维度,在降低向量空间维度的情况下,尽量保持模型的准确性。 (3)用户模型结构的建立,(模型的结构辨识)。利用模糊集合理论,建立了一组模糊IF-THEN规则,并用ANTIS网络实现。用户的兴趣向量ui和权重ri作为参数存储到网络中。PJ作为输入变量,输出量是系统对PJ和Ui相关度的评判值 互互RPredj。 ()用户模型参数的优化(模型的参数辨识人 采用的是 Candidat抑ank模式,在“学习一过滤一反馈一再学习一再过滤……”中调整参数。将用户的相关反馈 Ruseh和 Rpedj的差值作为误差信号,采用 Wdro一Hoff算法在线式学习,优化八。当与用户的交互达到一定程度后,进行离线式学习,对冰进行调整。 基于以上讨论,实现了一个基于模糊神经网络用户模型的信息过滤原型系统AUM&IP,并利用该原型系统对gOOgle返回的 Web页面进行过滤。通过过滤前后准确度的比较,验证了该系统的有效性。 本文中的一些思想对类似的应用有一定的借鉴价值。

【Abstract】 The Web personalized service is one of hotspots in AI and information technology. The current information retrieval system which mainly bases on search engine don’t concern enough about users’ different interests. Users get the same results if they submmit the same query words. At the same time, the good and bad are intermingled. Users have to find suitable information from huge amount of Web pages manually. It’s important to improve current information retrieval system with information filtering. Information filtering is the basis of personalized information service.User modeling can enhance the performance of IR. User model is a description about a user group or an individual user. With the user model, computer can acquire, store and restore user’s fuzzy dynamic interests. The information stored in user model make up the condition of IR and make IR more effective.In this paper, the current research and application on Internet personalized information retrieval is analyzed. And then Soft Computing including Fuzzy Logic and Nerual Networks are introduced too. According to ANFIS, an improved Nerual Fuzzy networks is introduced into user modeling and web information filtering to satisfy the user. The following key problems are disscussed in this paper.(1) How to express the Web page’s content and user’s interests. The Vector Space Model is used to map the Web page into a vector Pj. Before filtering, the query words and page examples input by the user is analyzed and mapped into vectors ui too.(2) How to select the character terms to decrease the number of dimensions. In the IF, the object of filtering is retrivaled Web pages. These pages can be divided into twoclasses: one is relevant pages R, another is unrelevant pages r. Taking use of thedifference of local weight between R and R, we choose the term with most difference as character term.(3) How to model user and filter information. According to the theory of fuzzy sets, a group of IF-THEN principals are constructed and implemented by ANFIS. The user’s interests ui and Ui are stored in ANFIS as parameters. Pj is the input variable, and the relevance between Pj and ui, named Rpredj, is the output variable.(4) How to optimize and adjust the parameters. We adopt Candidate/Rank mode. The parameters are optimized in a way ’training-filtering-feedbacking-training-filtering’. The difference between the user’s feedback Rusrj and the ANFIS output Rpredj is taken as the error. We optimize ri in a Widrow-Hoff algorithm, and optimize ui in a batch learning.According to the discussion before, the AUM&IF system, a prototype of user model-based Web filtering system, is evaluated by comparing its performance with analogous systems. The results achieved show that the use of user modeling techniques can improve the performance of Web information filtering system, and point out interesting challenges for future investigations.Some ideas in our work can be helpful to the similar application.

  • 【分类号】TP183;TP393.092
  • 【被引频次】9
  • 【下载频次】274
节点文献中: 

本文链接的文献网络图示:

本文的引文网络