节点文献

彩铃智能搜索引擎的设计与实现

Design and Implementation of CRBT Service Domain Oriented Intelligent Search Engine

【作者】 隋毅

【导师】 王纯;

【作者基本信息】 北京邮电大学 , 计算机科学与技术, 2008, 硕士

【摘要】 彩铃业务是一项由被叫(或主叫)用户定制,为主叫用户提供一段悦耳的音乐或一句问候语来替代普通回铃音的业务。用户申请开通彩铃业务之后,可以自行设定个性化回铃音,在其做被叫时,为主叫用户播放个性化定制的音乐或录音,来代替普通的回铃音。近几年来,随着彩铃业务的迅猛发展,彩铃平台中的铃音数量与日俱增,数以万计的铃音出现在用户的眼前,各家铃音制作商创作的千奇百怪的彩铃使用户越发不知所从,难以挑选,现有的各种接入方式中的传统铃音查找方式已经不能满足用户的需要。另一方面,由搜索巨头Google公司所引领的搜索技术革新使得搜索领域有了突飞猛进的发展,各种分词、索引、排序等算法不断涌现,并出现了以Lucene、Nutch等为代表的开源搜索引擎工具,搜索技术已经日趋成熟。垂直搜索是目前搜索领域的重点发展方向之一。它是搜索引擎的细分和延伸,是对网页库中的某类专门的信息进行一次整合,定向分字段抽取出需要的数据进行处理后再以某种形式返回给用户。垂直搜索引擎和普通的网页搜索引擎的最大区别是对网页信息进行了结构化信息抽取,将非结构化数据抽取成特定的结构化信息数据,网页搜索是以网页为最小单位,而垂直搜索是以结构化数据为最小单位。然后将这些数据存储到数据库,进行进一步的加工处理。本文所介绍的彩铃智能搜索引擎正是利用现有的搜索技术,针对彩铃平台所开发的一套高效、智能的垂直搜索引擎。第一章引言简单介绍了目前垂直搜索引擎的发展现状。第二章对彩铃平台做了一个总体的介绍,从组网、数据、接入方式等方面分析了彩铃平台的特点。第三章介绍了目前搜索引擎领域中所用到的关键技术,以及今后的发展趋势。第四章是本文的重点之一,在对彩铃平台中各项数据进行了统计分析后,研究了在彩铃平台中应用搜索引擎技术的可行性,并提出了目标系统所应具备的能力,随后对各种不同搜索方式的搜索流程进行了设计,在全面分析了系统功能后,提出了一套较详细的系统框架设计方案,并定义了与外部功能实体间的交互协议。在第五章中,重点说明了彩铃智能搜索中所用到的分词、模糊匹配、权值算法等关键技术;其中SKM算法是针对彩铃平台的数据特点开发出的模糊匹配算法,在本章中做了详细论述;本章第三节则重点讨论了在搜索结果排序过程中所使用的一套独特的权值算法,对单字、关键词、铃音等对象的权重计算方法做了详细的阐述。第六章则利用现有测试数据对算法的效率与已知算法进行比较,并对算法性能做了详细讨论。

【Abstract】 Color Ring Back Tone(CRBT) is a business service that customed by recipient user, providing a pleasant music or a salutation to replace ordinary ring tones. After registering CRBT service, customers can set their own personalized ring tones, which will be played to the caller to replace the ordinary ring tones when they are called.In recent years, with the rapid development of CRBT service, CRBT platform in the growing number of ring tones, tens of thousands of ring tones in the user’s immediate, the various kinds of ring tones made by individual ring tones producers make customers feel it’s getting more difficult to make selection.AH existing access in the search approach has been unable to meet the needs of users. On the other hand, search giant Google’s search technology have made rapid development of various search innovations, sub-term, indexing, sorting algorithms are constantly emerging, and there to Lucene, Nutch as the representative of the open Source search engine tools, search technology matures.Vertical search is one of the key development direction for searching technology. It’s a kind of detailed and extended search engine, an integration for the websites of certain types of specialized information, targeting at the needs of field data extracted after treatment in some form back to the user. The biggest difference between vertical search engines and the general web search engines is the information on the website of structured information collected, unstructured data will be collected into a specific structure of the information and data. For the web search engine, web page is the smallest unit, while for the vertical search is structured data. These data are then stored to the database, for further processing.This paper introduces the CRBT intelligent search engine, which is a vertical search engine that uses existing search technology, oriented to CRBT platform for the development of a highly efficient and intelligent data searching. Chapter one briefly illustrates vertical search engine of the current status of development; in chapter two, the CRBT platform is described as a whole, in the view of network, data type and access way. Chapter three shows the key technologies in the area of search engine, as well as the development trend of the future. Chapter four is one of the emphases of this article, after statistician and analysis on CRBT data, it studies the feasibility to use search engine technology on CRBT platform, and puts forward the target system should have the ability to, and then designs search processes for all different access ways, in a comprehensive analysis of the system, sets forth a framework for more detailed system design, and the definition of functions and external interaction agreement between the entities. In the fifth chapter, it focuses on the CRBT intelligent search in the sub-term, fuzzy matching algorithm and key-weights algorithm technologies. The SKM algorithm is expounded verbosely, which is a kind of fuzzy-matching algorithm developed aiming to the data type of CRBT. The third section of this chapter is focused on the ranking in the search results in the course of the use of a unique algorithm weights of the word, keyword, ring items and other objects calculation of the weight of a detailed exposition. Chapter six uses existing test data to compare the efficiency with the well known algorithms, moreover discussed the performance of algorithms in detail.

【关键词】 彩铃搜索引擎分词算法模糊匹配
【Key words】 CRBTSearch engineSegmentation algorithmFuzzy matching
  • 【分类号】TP311.52
  • 【下载频次】159
节点文献中: 

本文链接的文献网络图示:

本文的引文网络