节点文献

基于实体和信息网络知识提取的邮件管理系统的设计和实现

【作者】 文捷

【导师】 汪卫;

【作者基本信息】 复旦大学 , 计算机软件和理论, 2010, 硕士

【摘要】 随着邮件个人用户的数据和信息级数增长,个人信息管理的研究成为热点。电子邮件作为个人信息的重要载体在个人信息业务中占据着重要的地位。随着个人信息的增加,用户在对邮件进行管理和使用变得越来越困难。例如,查询时用户经常遇到遗忘关键字的困扰,又或者搜索出来的结果不尽如人意。对此普通的邮件工具很难为用户组织和管理个人信息提供更好的帮助。与此同时,实体的相关研究也变得越来越热门,实体的识别(?)分类和应用也成为了业界关注的话题,而这些研究也与实体的载体文档有着密不可分的关系。电子邮件,就是日常生活中最常见的实体载体。于是,本文决定利用实体的特性以及它与邮件的密切联系,以此为基础进行数据挖掘分析以改善和帮助邮件个人用户组织和管理电子邮件中的个人信息。另外,由于电子邮件系统自身的特殊性,个人电子邮件用户以及与他通信的用户之间,也自然而然地构成了一个信息网络。随着信息网络变得随处可见,从信息网络提取知识变成一项重要的工作。于是随着研究的进一步深入,本文尝试从邮件个人用户及与他通信的用户构成的信息网络之中提取知识和有用的信息,以帮助用户更好地组织邮件数据和管理个人信息、。在信息网络的知识提取过程中,当前最热门的两种方式就是评分和聚类。评分和聚类都可以提供给用户信息网络数据上的总览,而每一种方法都是当前的一个热门研究方向。然而要注意到的是,评分和聚类是不能孤立地对待和处理的,只评分不聚类经常产生大量的无意义的数据。类似的,聚类大量数据在一个聚类而不加区分在大部分情况下也是没有意义的。本文在学习了当前最新的关于信息网络的评分聚类的研究后,根据电子邮件系统本身的特性提出了一些改进的方法,为用户管理电子邮件信息网络中的用户和邮件数据的重要性和提高日常工作查询结果的精度提供了帮助。本文首先提出了一个基于实体发现、查找和管理的邮件管理系统,并且在继续学习和研究后,提出了改进的方案,在信息网络知识提取的基础上,利用前人的经验和自身研究的特性,运用改进过的聚类和评分方法,有效改善了上述问题。同时对关键技术—中文分词,实体挖掘,实体关联管理,查询结果及信息网络结构图形化展现以及如何评分和聚类—的实现,提出了自己的想法和处理机制,达到了提高用户邮件管理效率的目的。

【Abstract】 With the increment of personal user’s information and data,the research on PIM is getting more and more popular and intense. As an important information repository,Email is playing a important role in the research on PIM.But with the increase of the personal information,personal users have to face more and more difficulties during using and management.For example,they always forget the query keywords when they prepare to search some important information from their email,or the users are not satisfied with the result from querying.It means that the email is difficult to help personal user organize and manage their information.At the same moment,the research about entity become a hot topic whereas more and more research pay attention to the identifying,categorizing and application of entity which are close to the media of entity.Email is the most common media of entity.Based on the feature of entity and the connection between entity and Email,this paper help the personal users organize and management the personal information from Email more effectively.Furthermore,because of the feature of the Email,a information network is found automatically between the personal user of Email and the users who connect with him.How to extract knowledge from information network has become an important work with the popularity of information network. With the further research.this paper try to study Email data more deeply,then we can extract useful knowledge and information from the network between Email user and his contacts.During the knowledge extraction from the information network,the two most popular method are ranking and clustering.Both clustering and ranking can provide overall views on information network data,and each has been a hot topic. However, ranking objects globally without considering which clusters they belong to often leads todumbresults.Similarly,clustering a huge number of objects in one huge cluster without distinction is dull as well.This paper propose some new methods to management the importance of contacts and mails after learning the newest research of ranking and clustering.In this paper,we propose a email management system which based on entity mining,querying and processing and improve the system after deeply study and research.With the feature of the information network and the experience of the former,we improve the methods of ranking and clustering. Meanwhile,we point out our idea to carry out the key technique----Chinese word segmentation entity mining entity association management,the reveal in graph for results and network,ranking and clustering.Atlast.our system help users improve their working efficiency on Email.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2012年 02期
  • 【分类号】TP393.098
  • 【下载频次】58
节点文献中: