节点文献

可复用资产管理系统中资产检索方法的研究与实现

Research and Implementation of Asset Retrieval in Reusable Asset Management System

【作者】 李闻杰

【导师】 乐嘉锦;

【作者基本信息】 东华大学 , 计算机软件与理论, 2008, 硕士

【摘要】 随着软件行业的发展,软件的需求量迅速增加且软件系统规模也日益扩大,越来越多的软件企业意识到软件复用的重要性。软件企业实施软件复用最有效途径是对企业自身的资产进行复用。可复用资产管理系统以对象管理组织(OMG)提出的可复用资产规约为理论依据,实现对企业内可复用资产的描述、存储和检索等功能。开发该系统时所面临的一个主要技术问题是如何检索系统中大量的资产,合理的资产检索方法能大大降低检索成本和理解成本,反之则会提升企业使用系统的难度,最终导致复用失败。文章结合国内软件企业的软件复用现状和企业的需求,确立了基于资产实体描述文件的关键词检索和领域刻面分类检索两种检索方法。它们分别适合企业实施可复用资产管理系统的初期和后期,能适应开发人员在软件复用经验上的成长变化。本文主要研究如何实现这两种检索方法,其中涉及对一些成熟的检索技术进行改进,并运用到可复用资产管理系统中,使检索功能更符合企业的需求。首先,本文对可复用资产规约进行研究,并实现了资产的实体描述文件,它是一种XML文档,包含了资产的元数据。在资产实体描述文件的<classification>元素中,实现了资产的关键词描述和领域刻面术语描述,这些描述信息用于建立资产的相关倒排索引,以提高资产检索效率。其次,文章详细阐述了利用传统信息检索技术对资产实体描述文件进行关键词抽取、编码和通过倒排索引实现关键词检索的过程。抽取关键词时,本文提出由人为指定资产的关键词序列以解决缺乏软件复用领域词典的问题,使用正向匹配算法实现资产实体描述文件的关键词抽取。为了使检索结果粒度更细,帮助用户获取资产中最相关的信息,本文研究对于某个查询关键词序列,如何利用Dewey编码查找资产实体描述文件树的关键词最小公共祖先结点。为了对检索结果进行排序,研究关键词与资产实体描述文件的相关度计算公式,并且从关键词的概率分布以及在描述文档中的位置两方面来衡量相关度。此外,本文对传统刻面模式进行了改进。在分析了传统刻面模式的不足后,提出基于FODA(面向特征的领域分析)的领域刻面分类模式,它将所有刻面分为三层,每一层的刻面组对应FODA的三个阶段:确定领域边界并建立边界模型、提取功能需求并建立特征模型和细化领域分析并建立架构模型。每一层中的刻面术语分别对应边界模型、特征模型和架构模型中的特征术语。实现领域刻面分类检索时,由于刻面术语之间存在一般/特殊关系,为了使术语与资产匹配时能体现这种关系,对刻面描述文件进行编码,并利用Dewey编码的特点判断术语的所有子术语、生成刻面匹配术语集合和计算术语权重。最后,详细介绍资产检索模块的设计与实现。以MVC模式中的模型实现资产检索模块,介绍实现这些模型的关键技术和核心代码。

【Abstract】 With the development of software industry, demands for software rapidly increase and scales of software systems also grow. Thus, more and more software enterprises have realized the importance of software reuse. The most effective method of software reusing for software enterprises is to reuse software assets of their own. The reusable asset management system, which is based on reusable asset specification presented by OMG, implements the functions of asset description, asset storage and asset retrieval. How to retrieve assets in the system is a main problem when we design the reusable asset management system. A proper retrieval method will effectively reduce the costs of retrieval. On the contrary, a rough method will make the use of system more difficult and results in a failure of software reuse.According to the situation of software reuse in domestic software enterprises and requirements from enterprises, this paper proposes keyword retrieval based on asset manifests and domain faceted retrieval. Keyword retrieval method is suitable for enterprises during their rookie days of reusable asset management system implementation. The other is suitable for enterprises during their mature days. These two retrieval methods will be adapted to the growth of developers’ software reusing experience. This paper mainly focuses on the implementation of these two retrieval methods. During the research, we improve some mature retrieval methods and apply them to the reusable asset management system.Firstly, the paper researches on reusable asset specification and creates asset manifests based on it. Asset manifests are XML files and contain metadata of assets. We create the descriptions of asset keywords and facet terms under the classification element in asset manifest. These descriptions are used to create inverted lists of asset.Secondly, the paper researches on extracting keywords from asset manifest, encoding asset manifest and creating inverted list. When researching on extracting keywords, we propose a method of manually defining keyword list, which can avoid a word-extracting dictionary of software reusing, and we use a directed word segment algorithm to realize extracting keywords from sentences. To help users catch the most related information of assets, the paper researches how to find the keywords smallest common ancestor in a manifest tree by analyzing Dewey id of nodes. To sort the result of retrieval, we research on calculating the correlation between asset and keyword list, which depends on keywords statistics and their positions in asset manifest.Morever, the paper proposes an improvement on the traditional facet scheme. After analyzing the shortage of traditional facet scheme, we present a domain facet scheme based on FODA (Featured Oriented Domain Analysis), which has three layers. Each layer in this facet scheme corresponds to a phase of FODA, and terms in each layer correspond to terms in context model, feature model and architecture model. Concerning implementation of domain faceted retrieval, terms in facet always have relationship of ancestor and descendant. To reflect this relationship among terms in matching assets, we encode the facet manifest to realize finding all descendants of one term, creating matching-term list and calculating term weights.Lastly, the paper introduces how we design the modules of asset retrieval. We implement these modules as models of MVC framework.

【关键词】 Dewey编码领域刻面检索可复用资产
【Key words】 Dewey codedomainfacetretrievalreusable asset
  • 【网络出版投稿人】 东华大学
  • 【网络出版年期】2008年 08期
  • 【分类号】TP311.52
  • 【下载频次】38
节点文献中: 

本文链接的文献网络图示:

本文的引文网络