节点文献

网络多媒体教育资源数据库检索研究

Research on Retrieval of Network Multimedia Educational Resources Database

【作者】 原佳丽

【导师】 孟祥增;

【作者基本信息】 山东师范大学 , 教育技术学, 2009, 硕士

【摘要】 不断向前发展的社会对教育提出的要求越来越高。作为一种现代化的教学手段,多媒体教学有效地促进了教育的信息化,积极地推动了教育的改革和发展。多媒体教学的开展离不开多媒体教育资源,目前,网络已成为全球最大的多媒体教育资源库。搜索引擎是人们从网上获取信息的亲密助手,但通用的搜索引擎多采用基于关键词的检索,利用它们从网上检索教学和学习所需的多种媒体资源的效率往往不高。本文在研究基于内容的多媒体检索的基础上,完善了一个面向基础教育的网络多媒体数据库检索系统,以期为中小学教师和学生等相关用户提供高效的、专业的网络多媒体资源检索服务。本文以中小学教材为依据组织基础教育多媒体主题词,从网上搜索、下载与主题词相关的多媒体教育资源。然后分析、提取多媒体的相关属性,建立多媒体教育资源属性索引数据库。对基于内容的图像、动画(Flash)、视频和音频数据库检索进行了研究,以ASP技术为支持实现了一个网络多媒体教育资源数据库检索系统。检索系统是本篇论文的主要内容,检索一开始,系统要对用户提交的多媒体内容和颜色这两项查询文本进行处理。论文提出了一种新的中文分词算法——快速双向分词算法,并根据该算法开发了一个分词模块,用于对内容描述查询文本进行中文分词。将中文分词所得结果中没有实际意义的词语和系统设定的缺省词语过滤掉,即可得到描述目标多媒体内容的关键信息。系统将根据该信息计算目标多媒体和数据库多媒体的内容描述相似度。另一方面,系统还需将颜色查询文本中的颜色名转换成HSI颜色模式值,以便于计算目标多媒体和数据库多媒体的颜色相似度。图像、动画、视频和音频四种多媒体类型各有其特征和属性,检索系统根据它们的主要属性设置检索条件,这些检索条件是和数据库多媒体表中的主要字段对应的。论文用相似度来衡量目标媒体和库中媒体之间的差距,系统通过比较用户根据检索条件提供的查询信息与数据库表中记录的相应字段值,计算目标媒体和库中媒体的相似度。不同的多媒体检索条件的相似度计算方法不同:对于格式和大小等简单的检索条件,系统采用布尔检索计算其相似度,即只有当用户提供的和库中存储的严格匹配时相似度才为1,否则为0。而对于内容和颜色等相对较复杂的检索条件,系统采用模糊检索计算其相似度,不同的检索条件的模糊算法不同。比如,系统比较处理后的内容描述查询文本与数据库表中记录的内容描述字段值,把它们的同义词比率定义为目标媒体和库中媒体的内容相似度。多媒体总相似度等于多媒体各相似度之积。为了提高检索效率,系统为多媒体数据库各表中的内容描述字段建立了索引,索引的使用加快了检索系统在多媒体内容描述检索条件上的检索速度。在为用户输出结果之前,检索系统将结果记录集放入了缓存,缓存的使用缩短了用户在输出页面进行翻页的时间。另外,本文还研究了如何提高ASP的执行效率,对检索系统的程序代码进行了改善。用户登录网络多媒体教育资源数据库检索系统后描述目标媒体,发出检索请求,然后由系统自动处理查询信息,计算各数据库媒体与目标媒体的相似度,把满足条件的记录资源的预览图和相似度等相关信息返回给用户。初步实验结果表明,对于多媒体数据库表中属性信息标注准确、详实的记录,系统检索结果的准确率较高,索引和缓存等的使用使系统检索的速度明显提高。

【Abstract】 The society develops continuously, and it makes increasing demands on education. As a modern teaching method, multimedia instruction has promoted the informatization, reform and development of education actively and effectively. Multimedia educational resources are necessary for the application of multimedia instruction. At present, the Internet has become the biggest library of multimedia educational resources. Search engines are people’s good helpers when they try to obtain information from the Internet. However, most general search engines are based on keywords, and they are not very efficient in searching for a variety of teaching and learning media more often than not. After studying content-based retrieval of multimedia, this paper improvs a retrieval system of network multimedia databases in elementary education, with a view to provide teachers, students and other related users with efficient, professional retrieval service of network multimedia resources.This paper organizes multimedia keywords according to primary and secondary school textbooks, and then searches and downloads multimedia educational resources from the Internet. Then, it analyses and extracts the properties of those multimedia resources, and establishes index databases of multimedia educational resources. It does some research on content-based retrieval of image, flash, video and audio databases, and develops a retrieval system of network multimedia educational resources database with ASP.The retrieval system is the main content of this thesis. At the beginning of retrieval, the system deals with content and color descriptions of target media. This paper brings forward a new algorithm for Chinese word segmentation - fast and two-way algorithm, and develops a Chinese word segmentation module to split the text of content description. After that, the system filters out useless words and finally gets the key information of content description. On the other hand, the names of colors in color description need to be converted to values of HSI color model.Image, flash, video and audio have their own characteristics and properties. The retrieval system sets retrieval conditions, which correspond to the main fields of multimedia database tables, according to the main properties of multimedia. Similarity is used to measure the gap between target media and library media. The system calculates similarities between target media and library media by comparing query information and the values of corresponding fields. Different conditions have different methods of similarity calculation. There are some simple retrieval conditions, such as format and size. As for these conditions, the system calculates by means of Boolean, that is, the similarity is 1 only when target media and library media match strictly, and otherwise it is 0. As for the other conditions, such as content and color, the system calculates by means of fuzzy. Different conditions have different methods of fuzzy similarity calculation. The total similarity of multimedia is the product of each similarity of multimedia.In order to improve the efficiency of the retrieval, the system indexes content fields of the tables of multimedia databases. The use of index speeds up the retrieval of multimedia content description. Before the system provides results to users, it puts results into cache. The use of cache can reduce the time of pageing. In addition, this paper discusses how to improve ASP’s efficiency and program codes are improved.Users describe target media and give retrieval requests after they log in. Then, it is up to the retrieval system to deal with query information automatically, and then compute similarities of library media. At last, results are provided to users, including previews, similarities and other related information. As is shown by experimental results, as for those records whose index information is accurate and detailed, the results of has higher accuracy. At the same time, the use of index and cache speeds up the retrieval obviously.

节点文献中: