

The Study of Telephone Content Text Classification Based on Ontology

【作者】 郑丹

【导师】 杨喜权;

【作者基本信息】 东北师范大学 , 计算机应用技术, 2008, 硕士

【摘要】 因特网的迅速发展,促使其访问方式的多元化发展。人们已经不再满足于仅仅通过计算机浏览器浏览因特网,越来越多的人希望可以使用电话,手机等通讯设备浏览网页。相对于表达能力有限的图像和文字,人们更倾向于使用自然语言交流。因此友好的语音交互越来越受到人们的青睐。VoiceXML建立在XML规范基础之上,是一种语音数据交换标准。给用户提供了通过语音工具访问网络资源的平台。VoiceXML作为一种语音数据的交换标准,它能够与数据库,以及其他建立在XML标准之上的其他数据文档进行无缝数据交换,从而把因特网和电话网紧密的结合起来。VoiceXML语音网关把用户文档提交给服务器,随着用户提交的信息量的增长,服务器在处理这些海量文档时,面临巨大压力,迫切的需要对信息进行自动分类,再对每个类别的文档分别处理。以往仅通过关键字本身对信息进行检索和分类,准确率和效率不是很理想,因为计算机不能理解关键字所蕴含的语义信息。为了能够更好地获得语义信息,在此引入本体的概念。可以借助本体来描述和分析关键字的语义。通过本体建模可以表达更深层次的语义信息。传统检索算法所采用的只是基于语法层面上字、词的简单匹配,而缺乏对知识的表示、处理和理解等能力。解决这些问题的关键在于把信息检索从基于关键字的语法匹配提升至基于知识(或上下文)层面的语义匹配。本体是一种知识表示工具,在实际应用中可能需要根据规则进行逻辑推理。本体的推理是指把隐含在显示定义和声明中的知识提取出来。本体是对共享概念模型的规范说明,是对知识的一种描述,如果要把本体应用在语义分析上就必须使用规则,在规则上进行推理。谓词逻辑是知识推理的重要表现手段。可以在本体表示知识库的基础上构建规则库,用来分析文本的语义信息。文中使用OWL语言来描述领域知识,使用规则系统来表示推理规则。目前编辑和开发本体的工具很多,本文采用了美国斯坦福大学的Protégé3.2.1作为构建本体的平台。在这个平台上我们模拟构建了一个学校后勤管理的部分本体。并在该本体的基础上构建规则集合,用来对文本信息进行推理。为了解决文本自动分类的问题,本文提出了基于本体的电话内容的分类。本体是一种能在语义和知识层次上描述知识模型的建模工具,被人们应用到文本分类中,提高了分类的精度和速度。

【Abstract】 The Internet develops rapidly and the methods of accessing the Internet are multifarious. People have been not satisfied with the only way to surf on line with the browsers of computers, Internet Explorer for example. And the users wish to view the web pages by telephone or mobile telephone instead of the computer screens. People prefer to communicate with natural languages rather than the figures and the letters. So the much friendly audio interface is becoming more popular. VoiceXML is a standard of exchanging audio data, which is based on XML. VoiceXML is a platform which provides an audio method to access the Internet. VoiceXML can connect and exchange data seamlessly with databases and other data documents based on XML standard. So it can connect the Internet with telephone net closely.The audio gateway based on VoiceXML submits the users’documents to the server. The server faces the huge pressure when the documents grow rapidly and it is needed to classify the information automatically. And then the classified documents will be handled respectively. It used to search or classify the information by keywords, but it doesn’t work well, because the computer can’t understand the implied semantic meaning of the keywords. Ontology is approached to solve the semantic problem. Ontology can be used to describe and analyze the semantic meaning of the keywords. The implied semantic information can be expressed by the ontology models. The classic search algorithms which match the words by the syntax and they lack the abilities of expressing, handling and comprehending of the knowledge. The main method to solve these problems is to match the words by semantics instead of syntax.Ontology is a kind of tool to describe knowledge, and it is a form of knowledge representation. And it can be the basis of the logical reasoning which works on rules. The reasoning of ontology means to extract the implied knowledge from the explicit definitions or statements. Ontology is an explicit and specification of a conceptualization, which is a kind of description of knowledge. If ontology is used for semantic analyze, rules must be approached. And the rules are used for reasoning. Predicate logic is an important form of knowledge representation. The rule system which is used to analyze the semantic information of text can be constructed on the knowledge repository which is based on ontology.OWL is used here to describe the knowledge in the domain and the rule system is used to express reasoning mechanism. There are lots of tools for editing and developing ontology. Protégé3.2.1 which is developed by Stanford University is the platform to construct ontology here. Protégéis an open ontology editor and it is expanded based on Java. Protégéprovides a lot plug-in and APIs. We simulate to build the ontology of Administration of a college. And a rule system is built on the ontology which is used to manipulate the text information.We advance to classify the text content of the telephone by ontology to solve the problem discoursed above. Ontology is a modeling tool to express the semantic meaning and the knowledge. It is used in taxonomy to increase the precision and the working speed.

【关键词】 分类本体VoiceXML
【Key words】 TaxonomyOntologyVoiceXML
  • 【分类号】TP391.1
  • 【下载频次】74