节点文献

Active XML数据管理基础问题研究

The Research on Basic Problems of Active XML Data Management

【作者】 马海涛

【导师】 郝忠孝;

【作者基本信息】 哈尔滨工业大学 , 计算机软件与理论, 2009, 博士

【摘要】 Active XML(Active eXtensible Markup Language)的提出,能够有效的解决当前分布式数据管理中存在的数据源异构性、交互性及自主性问题,为分布式Web数据管理提供了新的发展方向。AXML文档是一部分数据直接给出,另一部分数据以Web服务调用方式隐含给出的XML文档,通过触发这这些服务调用,可以获得其包含的隐含信息来扩充文档内容。AXML模式定义了符合约束条件的AXML文档集合。AXML数据管理需要考虑如下基础问题:(1)AXML数据交换是AXML的主要应用方式,而数据交换之前必须判定给定AXML文档通过触发其包含的服务调用是否能够转换成为符合目标模式要求的文档实例,从而引出了文档重写问题;(2)在某些情况下,还要考虑符合给定源模式的全部文档是否能够重写为目标模式实例,这就需要考虑模式之间的兼容性,该问题为模式重写问题;(3)AXML数据交换过程中,通常以查询方式来实现数据请求,而查询可满足性判定是执行给定查询的前提条件,通过判定给定查询的可满足性,可以过滤掉一部分不可满足查询,从而提高查询的执行效率;(4)保证AXML文档为有效文档是AXML数据管理的关键,也是AXML数据交换、文档查询的先决条件。本文基于树自动机理论,对AXML数据交换中存在的AXML文档重写和模式重写、AXML文档查询可满足性、AXML文档有效性检验问题进行了深入研究,目的是对上述问题提出有效的解决方法,从而让AXML能够更好的服务于分布式数据管理。第一,研究了AXML文档重写和模式重写问题。AXML文档重写问题是指判定给定文档通过触发其包含的服务调用是否能够将其转换成为符合目标模式的文档实例。AXML文档重写问题分为可能重写和安全重写,AXML文档可能重写是判定给定文档是否能够重写为目标模式的某一文档实例;AXML文档安全重写是判定给定文档的全部可生成文档是否能够重写为符合目标模式的文档实例。AXML模式重写问题是指判定符合给定源模式的全部文档是否能够重写为目标模式实例。首先,基于传统树自动机理论,定义了用于抽象描述AXML文档树的ADTA机(AXML DocumentTree Automata),基于ADTA机,给出了多项式时间复杂度的AXML文档可能重写判定算法,给出了算法的正确性证明;在ADTA机的基础上,定义了ADTA机补自动机,提出了多项式时间复杂度的AXML文档安全重写判定算法,给出了算法的正确性证明;然后,定义了用于描述AXML模式的ASTAr机(AXML Schema Tree Automata for Rewriting),给出了ASTAr机构造算法,ASTAr机定义了所有符合给定AXML模式约束的AXML文档集合;最后,通过分析AXML模式包含与模式重写的关系,基于ASTAr机,提出了多项式时间复杂度的AXML模式重写判定算法,分析了算法的正确性和有效性。第二,研究了模式约束下的AXML文档树模式查询可满足性问题。AXML文档查询可满足性问题是指判定符合给定模式约束的AXML文档是否满足给定查询表达式。首先,给出了AXML文档查询可满足性的形式化定义;然后,定义了用于抽象AXML模式的ASTAq机(AXML SchemaTree Automata for Queries),用于描述符合给定AXML模式约束的文档集合,定义了抽象树模式查询的TPQA机(Tree Pattern Query Automata),TPQA机描述了包含满足给定树模式查询表达式路径的文档集合;最后,基于ASTAq机和TPQA机,针对XPath树模式查询片段{“/,//,[ ]”},提出了一种多项式时间的AXML文档查询可满足性检验算法,分析了算法的正确性和有效性。第三,研究了AXML文档有效性检验问题。AXML文档有效性检验问题是指给定AXML文档及其服务调用规范,检验文档是否符合目标模式。定义了用于抽象AXML模式的ASTAv机(AXML Schema Tree Automata forValidation),该树自动机描述符合目标模式约束的文档集合,能够完成对给定文档当前状态的有效性检验;基于ASTAv机,通过分析服务规范与目标模式之间的关系,提出了一种多项式时间的AXML文档有效性检验算法,分析了算法的正确性和有效性。

【Abstract】 The presentation of Active XML(AXML for short) addresses the problems ofheterogeneity, interoperability and autonomy occurring in data management at thescale of the Web and becomes a new powerful tool for distributed data management.An AXML document is an XML document where some of the data is given explicitlywhile other parts are defined only intentionally by means of embedded calls to Webservices. When one of these calls is invoked, its results will be returned to enrich theoriginal document.The problem of AXML data management consists of the following problems:(1)AXML Data exchange is the main application and the sender must decide whetherthe given AXML document can be rewritten into a new one conforming to the gargetschema by invoking the embedded service calls, which introduced schema rewriting;(2)The applications sometimes may consider whether all the documents conformingto the original schema can be rewritten to the target schema, named schema rewrit-ing; (3)In AXML data exchange, applications often ask data in querying manner andsatisfiability is the first condition before executing the given query. After deciding thesatisfiability of querying AXML documents, the unsatisfied ones will be refused andimprove the efficiency of queries. (4)Validation of AXML documents is the key ofAXML data management and the first condition of AXML data exchange and query-ing documents.Based on tree automata theory, document rewriting and schema rewriting, sat-isfiability of querying documents and validation of documents that are studied in thethesis. The goal of this thesis is to propose efficient algorithms to address these prob-lems and make AXML to be suitable for data management of the Web.First, problems of AXML document rewriting and schema rewriting are studied.AXML document rewriting is to decide whether the given document can be translatedto the new one conforming the garget schema by invoking some service calls embed-ded in it. AXML document rewriting contains two types: possible rewriting and saferewriting. The former is to decide whether the given document can be rewritten intoanother one conforming to the target schema; the latter is to decide whether the set of produced documents from the given AXML document can be rewritten to the docu-ments conforming to the garget schema. Schema rewriting is to decide whether all thedocuments conforming to the given AXML schema can be translated to the new onesof the target schema. Firstly, the AXML Document Tree Automata (ADTA) used torepresent AXML documents is defined, together with the building algorithm. Basedon ADTA and the defined complement of ADTA, both of algorithms, performed inpolynomial time, for deciding AXML document possible rewriting and safe rewritingare presented and the correction of them are analyzed. Secondly, the AXML SchemaTree Automata for rewriting (ASTAr) used to represent AXML schemas is also de-fined, together with the building algorithm is presented. Finally, based on ASTAr, analgorithm for deciding AXML schema rewriting is proposed which is performed inpolynomial time by analyzing the relationship between the AXML schema contain-ment and schema rewriting; the correction and efficiency are also given.Second, problem of satisfiability of querying AXML documents conforming toa given AXML schema is studied. For the efficient evaluation of a query over anAXML document, one should check whether there exists an (A)XML document ob-tained from the original one by invoking some Web services, on which the queryhas a non-empty answer. firstly, the formal definition of satisfiability of queryingAXML documents is defined. Then, a new tree automaton, named AXML SchemaTree Automata for Queries (ASTAq), is defined which can efficiently represent the setof AXML documents conforming to the given schema; a TPQA (Tree Pattern QueryAutomaton) is also defined which can represent the document set of satisfying querypathes of the given tree pattern query. Finally, based on ASTAq and TPQA, an al-gorithm for checking satisfiability of tree pattern queries for AXML documents thatruns polynomial time is proposed and experiments were made to verify the utility ofsatisfiability checking.Third, the problem of validating AXML documents is studied. Validation ofAXML document is to check whether a given AXML document with service callsspecification conforms the target schema. A new tree automaton, named AXMLSchema Tree Automaton for validation (ASTAv), is defined which can efficientlyrepresent the set of AXML documents conforming to the given schema and checkthe validation of the current state of an AXML document. Based on ASTAv, an al-gorithm is proposed for checking AXML validation performing in polynomial time through analyzing the relationship between the service calls specification and the tar-get schema. Finally, the experiment results show that our algorithm gives rise to anefficient validation method for AXML documents.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络