èŠ‚ç‚¹æ–‡çŒ®

XMLç»“æž„ç´¢å¼•æŠ€æœ¯åŠæŸ¥è¯¢ä¼˜åŒ–ç ”ç©¶

Study on Structural Index Technology and Query Optimization for XML

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ éƒæ¾æ¶›ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ é‡åº†å¤§å¦ ï¼Œ è®¡ç®—æœºè½¯ä»¶ä¸Žç†è®ºï¼Œ 2003ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ ä¸ºäº†å®žçŽ°XMLçš„æŸ¥è¯¢ä¼˜åŒ–ï¼Œè¿‘å¹´æ¥äººä»¬ç›¸ç»§æå‡ºäº†å¾ˆå¤šç´¢å¼•æŠ€æœ¯å’Œè¿žæŽ¥ç®—æ³•[12,13,14,15,16,23,24]ã€‚è¿™äº›ç´¢å¼•ä¸»è¦æ˜¯æ ¹æ®è¾¹æ ‡ç¾å’Œå…ƒç´ å€¼å»ºç«‹çš„ã€‚ç„¶è€Œæœ‰çš„ç´¢å¼•ä¸åŒ…å«æ‰€æœ‰çš„å…ƒç´ ç»“ç‚¹ï¼Œå› è€Œåœ¨è¿›è¡ŒæŸ¥è¯¢æ—¶è®¸å¤šè·¯å¾„ä»éœ€è¦æ£€æµ‹ï¼›æœ‰çš„åœ¨å‘å‰æˆ–å‘åŽéåŽ†æ—¶äº§ç”Ÿäº†å¤§é‡çš„å†—ä½™æ•°æ®ï¼Œä»Žè€Œé€ æˆæŸ¥è¯¢ä»£ä»·è¾ƒå¤§ã€‚å¦å¤–ï¼Œåœ¨æ‰€æå‡ºçš„ç®—æ³•ä¸ï¼Œå°½ç®¡æœ‰çš„ç®—æ³•ï¼Œå¦‚MPMGJNç®—æ³•[23]ä¼˜äºŽæ ‡å‡†çš„RDBMSè¿žæŽ¥ç®—æ³•ï¼Œä½†æ˜¯è¯¥ç®—æ³•ä¸ºåŒ¹é…åŸºæœ¬çš„ç»“æž„å…³ç³»ï¼Œç‰¹åˆ«æ˜¯åœ¨çˆ¶åå…³ç³»æƒ…å†µä¸‹ï¼Œæ‰§è¡Œäº†å¤§é‡ä¸å¿…è¦çš„è®¡ç®—å’Œå ç”¨äº†å¤§é‡çš„I/Oèµ„æºï¼›æœ‰çš„ç®—æ³•è™½ç„¶ä»£è¡¨äº†ç»“æž„è¿žæŽ¥ç®—æ³•çš„å…ˆè¿›æ°´å¹³ï¼Œå¦‚Stack-Tree-Desc[24]è¿žæŽ¥ç®—æ³•ï¼Œä½†æ˜¯å®ƒæ²¡æœ‰åˆ©ç”¨ç´¢å¼•ç»“æž„è€Œæ˜¯é¡ºåºæµè§ˆè¾“å…¥åˆ—è¡¨ã€‚è¿™æ ·ï¼Œå¿…ç„¶æµªè´¹I/Oèµ„æºï¼Œå½±å“è¿žæŽ¥çš„é€Ÿåº¦ã€‚é’ˆå¯¹ä»¥ä¸Šæƒ…å†µï¼Œæœ¬æ–‡åšäº†ä»¥ä¸‹å‡ ä¸ªæ–¹é¢çš„å·¥ä½œï¼šâ‘ ç”±äºŽé‡‡ç”¨ä¼ ç»Ÿçš„Numbering Schemaæ–¹æ³•æ¥è¡¨ç¤ºXMLæ–‡ä»¶ç»“æž„ä¸ä¾¿äºŽå…ƒç´ æ›´æ–°ï¼Œæœ¬æ–‡åœ¨æ”¹è¿›çš„åŸºç¡€ä¸Šæå‡ºäº†Sparse Numbering Schemaæ–¹æ³•ã€‚ä¸Žä¼ ç»Ÿæ–¹æ³•ç›¸æ¯”ï¼Œå…¶ä¼˜ç‚¹åœ¨äºŽï¼šç”±äºŽåœ¨æ’å…¥æ–°ç»“ç‚¹æ—¶ä¸éœ€è¦é‡æ–°è®¡ç®—å…¶ç»“ç‚¹çš„startå’Œendå€¼ï¼Œæ ‘ç»“æž„æ›´æ–°æ•ˆçŽ‡å¾—åˆ°æé«˜ï¼›æ ‘çš„åˆ›å»ºåªéœ€éåŽ†ä¸€æ¬¡æ–‡æ¡£ï¼Œè¿›ä¸€æ¥åœ°èŠ‚çœäº†å»ºæ ‘å¼€é”€ï¼›æ¤å¤–ï¼Œå®ƒè¿˜èƒ½ä¸ºç´¢å¼•æä¾›ä¸€ä¸ªç›¸å¯¹æŒä¹…å’Œç¨³å®šçš„å‚è€ƒã€‚â‘¡ é‰´äºŽç›®å‰å…³äºŽNumbering Schemaå˜å‚¨æ–¹æ³•çš„ç ”ç©¶è¾ƒä¸ºå°‘è§ï¼Œæœ¬æ–‡é’ˆå¯¹Sparse Numbering Schemaè¿›è¡Œç ”ç©¶ï¼Œç»™å‡ºäº†åœ¨å…³ç³»æ•°æ®åº“ä¸çš„å˜å‚¨æ–¹æ³•ã€‚è¯¥å˜å‚¨æ–¹æ³•ä¸ä»…æœ‰åˆ©äºŽæ ¹æ®startå€¼å¿«é€Ÿå»ºç«‹ç´¢å¼•ï¼Œè€Œä¸”å¯ä»¥èŠ‚çœå˜å‚¨ç©ºé—´ã€‚â‘¢ æœ¬æ–‡å°†å…³ç³»æ•°æ®åº“ä¸Bï¼‹æ ‘ç´¢å¼•æŠ€æœ¯ä¸ŽSparse Numbering Schemaç›¸ç»“åˆï¼Œæå‡ºäº†ä¸€ç§æ–°çš„XMLæ–‡ä»¶ç´¢å¼•ç»“æž„â€”â€”B~+æ ‘ç»“æž„ç´¢å¼•ï¼Œå®ƒå¯¹XMLæŸ¥è¯¢ä¸è¿žæŽ¥æ“ä½œå’Œå…ƒç´ å®šä½æ“ä½œçš„ä¼˜åŒ–æœ‰ç€é‡è¦ä½œç”¨ã€‚è¿›è€Œï¼Œé€šè¿‡å¼•å…¥æŒ‡é’ˆå¯¹è¯¥ç´¢å¼•è¿›è¡Œæ”¹è¿›ï¼Œæå‡ºäº†ä¸€ç§å¸¦æœ‰Sibling Pointerçš„B~+æ ‘ç»“æž„ç´¢å¼•ï¼ˆç®€ç§°B~+-SPï¼‰ã€‚åˆ©ç”¨è¿™ç§ç´¢å¼•å¯ä»¥å…‹æœå…ƒç´ æŸ¥æ‰¾æ€»æ˜¯ä»Žæ ‘çš„æ ¹éƒ¨å¼€å§‹è¿›è¡Œçš„ç¼ºé™·ã€‚â‘£ åŸºäºŽB~+-SPç´¢å¼•ï¼Œæœ¬æ–‡è¿˜ç ”ç©¶ç»™å‡ºäº†Anc-Desc-B~+-spè¿žæŽ¥ç®—æ³•ã€‚ç»ç†è®ºåˆ†æžï¼Œå…¶ç®—æ³•çš„æ—¶é—´å¤æ‚åº¦O(|A|+log|A|)æ¯”æ²¡æœ‰é‡‡ç”¨è¯¥ç´¢å¼•çš„Stack-Tree-Descç®—æ³•[24]çš„æ—¶é—´å¤æ‚åº¦O(|A|+|D|+|outlist|)æ˜Žæ˜¾é™ä½Žï¼Œå› |D|â‰¥|A|ï¼Œæ•…|D|+|outlist|>>log|A|ã€‚ç»åˆæ¥å®žéªŒè¡¨æ˜Žï¼Œæœ¬ç®—æ³•æ˜¯ä¸€ä¸ªæœ‰æ•ˆã€å¿«é€Ÿçš„è¿žæŽ¥ç®—æ³•ã€‚â‘¤ åœ¨XMLæŸ¥è¯¢ä¸ï¼Œå½±å“æŸ¥è¯¢æ—¶é—´çš„å¦ä¸€ä¸ªé‡è¦å› ç´ æ˜¯å¯¹æ¶‰åŠçš„XMLæ•°æ®æºçš„å®šä½é—®é¢˜ã€‚ä¸ºè§£å†³XMLæ•°æ®æºçš„å¿«é€Ÿå®šä½é—®é¢˜ï¼Œæœ¬æ–‡æå‡ºäº†ä¸€ç§åˆ†å¸ƒå¼XMLæ•°æ®æºå®šä½ç³»ç»Ÿæ¡†æž¶ï¼Œåä½œå¼XMLæœç´¢å¼•æ“Žï¼ˆCXSEï¼‰ã€‚CXSEé€šè¿‡åŸºäºŽç«™ç‚¹é€‰æ‹©æœç´¢å’Œå¯¹XMLæ•°æ®æºè®¡åˆ†ç‰æ–¹æ³•æ¥ç¼©çŸæ”¶é›†æ—¶é—´ï¼Œæ¥å®žçŽ°å¯¹XMLæ•°æ®æºçš„å¿«é€Ÿã€å‡†ç¡®å®šä½ã€‚ç‰¹åˆ«åœ°ï¼Œå½“åœ¨XMLæŸ¥è¯¢ä¸åŒæ—¶æ¶‰åŠå¤šä¸ªXMLæ•°æ®æºæ—¶ï¼Œè¯¥å¹¶è¡Œæœç´¢æŠ€æœ¯ä¹Ÿèƒ½èµ·åˆ°ä¸€å®šçš„æ•ˆæžœã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Various index techniques and join algorithms [12,13,14,15,16,23,24] have been recently proposed, in order to realize query optimization for XML. The indices are built on the tags and the element values. Nevertheless, some indices do not contain all element nodes, many paths need to still be examined in the query; other indices produce redundant data in the preorder or postorder traversal, this makes the cost of query much more. In the proposed join algorithms, although some algorithms such as MPMGIN algorithm [23], outperform standard RDBMS join algorithms, they perform a lot of unnecessary computation and I/O for matching basic structural relationships, especially in the case of parent-child relationships; other algorithms such as the Stack-Tree-Desc algorithm [24], represent the state-of-the-art in structural joins, however, they do not utilize indexed structures but sequentially scan the input lists. Thus, I/Oâ€™s can be wasted for scanning element that do not participate in the join, and join speed can be influenced.According to this situation, the main works and contributions in the paper are as follow:â‘ As it is inconvenient to update that the conventional numbering schema is used to represent the structure of XML document, a sparse numbering schema based on improvements is proposed in this paper. By comparing with the conventional method, the sparse numbering schema has some merits as follow: the values of start and end do not recomputed, when a new node is inserted, the updating efficiency of tree structure is improvement, and the XML documents are traversed only once, when the schema is constructed, this further decrease the cost of building tree, and the schema can provide a durable conference for index.â‘¡ As the storage approach of the numbering schema is scarce, this paper proposes a new approach that the sparse numbering schema is stored in the relational database. By utilizing this storage approach, the indices can be easily built on the start column, and the storage space can be mostly reduced.â‘¢ This paper refers to the indexing technologies of B~+-tree in DBMS, and combine it with the sparse numbering schema, and proposes a new indexed structure â€” B~+-tree structural index. It is very important for the optimization of join operation and element location in XML query. By introducing the pointers for further improving the indexed structure, this paper proposes a B~+-tree structural index with sibling pointer (B~+-sp for shorten). This structural index can avoid the defect of always traversing B~+-tree from the boot.â‘£ Based on B~+-sp, this paper proposes Anc-Desc- B~+-sp join algorithm. It is theoretically analyzed that the time complexity of the join algorithm (O(|A|+log|A|)) is obviously less than that of the Stack-Tree-Desc algorithm (O(|A|+|D|+|outlist|) [24], because of |D|â‰¥|A|, |D|+|outlist|>>log|A|. Experiment results primarily prove that the join algorithm is more efficient and quick join algorithm.In XML query, the other important factor of influencing the query time is the problem of the location of XML data source. In order to resolve the problem, this paper proposes a distributed XML data source location system frame, called Cooperative XML Search Engine (CXSE). CXSE can shorten collection time by searching based site selection and<WP=7>â‘¤ scoring Web document. Accordingly, CXSE really realize to quickly and correctly locate the URL of XML document needed. Moreoverï¼Œthe retrieval system is available to several XML data source in XML query.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ XMLï¼› Numbering Schemaï¼› å˜å‚¨ï¼› B~+æ ‘ç»“æž„ç´¢å¼•ï¼› è¿žæŽ¥ç®—æ³•ï¼›
ã€Key wordsã€‘ Numbering Schemaï¼› storageï¼› B~+-tree structural indexï¼› join algorithmï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ é‡åº†å¤§å¦

ã€åˆ†ç±»å·ã€‘TP311.1
ã€è¢«å¼•é¢‘æ¬¡ã€‘4
ã€ä¸‹è½½é¢‘æ¬¡ã€‘294

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

XMLç»“æž„ç´¢å¼•æŠ€æœ¯åŠæŸ¥è¯¢ä¼˜åŒ–ç ”ç©¶

Study on Structural Index Technology and Query Optimization for XML

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

XMLç»“æž„ç´¢å¼•æŠ€æœ¯åŠæŸ¥è¯¢ä¼˜åŒ–ç ”ç©¶