èŠ‚ç‚¹æ–‡çŒ®

è‡ªé€‚åº”ç½‘ç»œä¿¡æ¯èŽ·å–æœåŠ¡æŠ€æœ¯ç ”ç©¶

Research on Adaptive Techniques for Web Information

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ åˆ˜åº·è‹—ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ æµ™æ±Ÿå¤§å¦ ï¼Œ è®¡ç®—æœºç§‘å¦ä¸ŽæŠ€æœ¯ï¼Œ 2008ï¼Œ åšå£«

ã€æ‘˜è¦ã€‘ ç½‘ç»œæŠ€æœ¯çš„å‘å±•å¸¦æ¥äº†å¯èŽ·å–ä¿¡æ¯èµ„æºçš„æžå¤§ä¸°å¯Œï¼Œä½†æ˜¯ç½‘ç»œèµ„æºçš„æ— åºã€è‰¯èŽ ä¸é½ç‰ç¼ºç‚¹ä¹Ÿç»™ç”¨æˆ·èŽ·å–ç½‘ç»œä¿¡æ¯å¸¦æ¥äº†å›°éš¾ã€‚ç½‘ç»œä¿¡æ¯èŽ·å–æœåŠ¡æ˜¯æŒ‡åœ¨äº’è”ç½‘ä¸Šï¼Œé’ˆå¯¹ä¸ªäººç”¨æˆ·çš„ç½‘ç»œä¿¡æ¯éœ€æ±‚ï¼Œä»¥çŽ°ä»£ä¿¡æ¯æŠ€æœ¯ä¸ºæ‰‹æ®µï¼Œå‘ç”¨æˆ·æä¾›æ‰€éœ€çš„äº’è”ç½‘ä¿¡æ¯äº§å“åŠæœåŠ¡ï¼Œå…¶æœåŠ¡æ¨¡å¼åŒ…æ‹¬ä¿¡æ¯æ‹‰å–å’Œä¿¡æ¯æŽ¨é€ã€‚è‡ªé€‚åº”ç½‘ç»œä¿¡æ¯èŽ·å–æœåŠ¡æŠ€æœ¯ï¼Œåˆ™æ˜¯æ ¹æ®ç”¨æˆ·éœ€æ±‚ã€ä¿¡æ¯æºç‰¹å¾ã€ç³»ç»Ÿè´Ÿè½½ç‰å› ç´ ï¼Œè‡ªé€‚åº”åœ°åŠ¨æ€è°ƒæ•´è‡ªèº«è¡Œä¸ºï¼Œé«˜æ•ˆã€äººæ€§åŒ–åœ°æä¾›é«˜è´¨é‡çš„ä¿¡æ¯ã€‚å‡†ç¡®ã€å…¨é¢åœ°æ„ŸçŸ¥ç”¨æˆ·éœ€æ±‚ï¼Œæ˜¯å®žçŽ°ç½‘ç»œä¿¡æ¯èŽ·å–æœåŠ¡çš„åŸºç¡€ã€‚ç½‘ç»œç”¨æˆ·æ—¢æ˜¯ç½‘ç»œä¿¡æ¯èµ„æºçš„åˆ©ç”¨è€…åˆæ˜¯æä¾›è€…ï¼Œå› æ¤å¯ä»¥é€šè¿‡åˆ†æžç”¨æˆ·çš„æµè§ˆå†…å®¹ã€è¡Œä¸ºå’Œå‘å¸ƒçš„ä¿¡æ¯ç‰æ¥èŽ·å–ç”¨æˆ·éœ€æ±‚ã€‚èŽ·å¾—ç”¨æˆ·éœ€æ±‚åŽï¼Œå¦‚ä½•åœ¨æµ©ç€šçš„ç½‘ç»œä¿¡æ¯èµ„æºä¸ç›é€‰å‡ºç›¸å…³çš„ä¿¡æ¯ï¼Œå¹¶ä»¥æ›´äººæ€§åŒ–çš„æ–¹å¼å±•çŽ°ç»™ç”¨æˆ·ï¼Œæ˜¯ç½‘ç»œä¿¡æ¯èŽ·å–æœåŠ¡æˆåŠŸçš„å…³é”®ã€‚æ¤å¤–ï¼Œç”¨æˆ·å¯¹ä¿¡æ¯èŽ·å–çš„æ—¶æ•ˆæ€§é€šå¸¸æœ‰è¾ƒé«˜çš„è¦æ±‚ï¼Œå¦‚ä½•ä¿éšœä¿¡æ¯èŽ·å–ç³»ç»Ÿçš„æ€§èƒ½ä¹Ÿæ˜¯ç½‘ç»œä¿¡æ¯èŽ·å–æœåŠ¡çš„é‡è¦ç ”ç©¶å†…å®¹ä¹‹ä¸€ã€‚ä¸ºè§£å†³ä¸Šè¿°é—®é¢˜ï¼Œæœ¬æ–‡é¦–å…ˆæå‡ºäº†ä¸€ç§åŸºäºŽæŸ¥è¯¢æ§ä¹‰æ€§è¡¡é‡çš„è‡ªé€‚åº”ä¿¡æ¯æ‹‰å–æŠ€æœ¯ã€‚å¯¹ç”¨æˆ·è¯·æ±‚è¿›è¡Œæ§ä¹‰æ€§è¡¡é‡ï¼Œæ ¹æ®å…¶æ§ä¹‰æ€§è‡ªé€‚åº”åœ°å†³å®šç»“æžœçš„å±•çŽ°æ–¹å¼ï¼›åœ¨ç»“æžœç›é€‰å’Œå±•çŽ°æ–¹é¢ï¼Œåˆ†åˆ«æå‡ºäº†å¤šç‰¹å¾èžåˆæŽ’åºç®—æ³•å’Œèšç±»ç®—æ³•ï¼›å¹¶åœ¨äº’è”ç½‘é¢‡å…·ä»£è¡¨çš„æ–°å…´èµ„æºï¼šå¤šåª’ä½“ä¿¡æ¯(ä»¥å›¾åƒä¸ºä¾‹)å’Œæ›´æ–°é¢‘ç¹çš„åŠ¨æ€èµ„æº(ä»¥åšå®¢ä¸ºä¾‹)ä¸Šå¾—åˆ°äº†éªŒè¯ã€‚å…¶æ¬¡ï¼Œæœ¬æ–‡é’ˆå¯¹ç½‘ç»œæ´»åŠ¨ä¸çš„ä¿¡æ¯å‘å¸ƒè€…å’Œä¿¡æ¯æµè§ˆè€…å„æå‡ºäº†ä¸€ç§åŸºäºŽä¸ªæ€§åŒ–å»ºæ¨¡çš„è‡ªé€‚åº”ä¿¡æ¯æŽ¨é€æŠ€æœ¯ï¼šå¯¹äºŽä¿¡æ¯å‘å¸ƒè€…ï¼Œä»¥å½“å‰ç½‘ç»œæµè¡Œçš„åšå®¢è¿™ä¸€ä¸ªæ€§åŒ–ä¿¡æ¯å‘å¸ƒå¹³å°ä¸ºç ”ç©¶çŽ¯å¢ƒï¼Œæå‡ºäº†ä¸€ç§åˆ©ç”¨åšå®¢æ–‡ç« å¯¹ç”¨æˆ·è¿›è¡Œé•¿çŸæœŸå…´è¶£å»ºæ¨¡çš„æ–¹æ³•ï¼Œå¹¶å¯¹åšå®¢ç©ºé—´è¿›è¡Œç¤¾ç¾¤åˆ’åˆ†ï¼Œå®žçŽ°äº†å…´è¶£ç›¸ä¼¼å¥½å‹çš„æŽ¨èï¼›å¯¹äºŽä¿¡æ¯æµè§ˆè€…ï¼Œåˆ©ç”¨ç”¨æˆ·å½“å‰æµè§ˆç½‘é¡µçš„å†…å®¹ä½œä¸ºç”¨æˆ·ä¸ªæ€§ä¿¡æ¯çš„è¡¨å¾ï¼Œæå‡ºäº†ä¸€ç§åŸºäºŽæƒ…æ„Ÿå’Œä¸»é¢˜åˆ†æžçš„ä¸Šä¸‹æ–‡å¹¿å‘ŠæŽ¨èæŠ€æœ¯ï¼Œä½¿æŽ¨é€çš„å¹¿å‘Šä¸ä»…ä¸»é¢˜ç›¸å…³ï¼Œè€Œä¸”ä¸Žç½‘é¡µå†…å®¹ä¸æ½œåœ¨çš„ç”¨æˆ·æƒ…æ„Ÿç›¸ç¬¦åˆï¼Œä»Žè€Œæ›´å…·é’ˆå¯¹æ€§ã€‚æŽ¥ç€ï¼Œé’ˆå¯¹ç½‘ç»œä¿¡æ¯èŽ·å–æœåŠ¡åœ¨æ€§èƒ½ã€å¯æ‰©å±•æ€§ç‰æ–¹é¢çš„éœ€æ±‚ï¼Œä»¥ä¿¡æ¯æ‹‰å–æœåŠ¡çš„å…¸åž‹åº”ç”¨â€”â€”æœç´¢å¼•æ“Žä¸ºåˆ‡å…¥ç‚¹ï¼Œæå‡ºäº†ä¸€ç§å…·æœ‰è¾ƒå¥½å¯æ‰©å±•æ€§çš„æ··åˆåž‹åˆ†å¸ƒå¼ç´¢å¼•ç»„ç»‡ç–ç•¥(Loc-Glob)ã€‚å¹¶åœ¨Loc-Globç´¢å¼•ç»„ç»‡ç–ç•¥ä¹‹ä¸Šè¿›è¡Œæ€§èƒ½ä¼˜åŒ–ï¼šåŸºäºŽç´¢å¼•è¯è´Ÿè½½åŠåŠ¨æ€å˜åŒ–æŸ¥è¯¢æµï¼Œé‡æ–°åˆ†å¸ƒå’Œå†—ä½™ç´¢å¼•ï¼›åŸºäºŽç´¢å¼•æœåŠ¡å™¨çš„å®žæ—¶ç³»ç»Ÿè´Ÿè½½ï¼Œå®žçŽ°æŸ¥è¯¢è·¯å¾„çš„è‡ªé€‚åº”ä¼˜åŒ–ã€‚åŸºäºŽä¸Šè¿°ç ”ç©¶ï¼Œæœ¬æ–‡è®¾è®¡å¹¶å®žçŽ°äº†ä¸€ä¸ªé‡‡ç”¨è‡ªé€‚åº”æŠ€æœ¯çš„åšå®¢ç©ºé—´ä¿¡æ¯èŽ·å–åŽŸåž‹ç³»ç»Ÿï¼Œæä¾›äº†åšå®¢æœç´¢å¼•æ“Žã€åšå®¢å¥½å‹æŽ¨èã€å¹¿å‘ŠæŽ¨èç‰å¤šç§åº”ç”¨æœåŠ¡ï¼ŒéªŒè¯äº†æœ¬æ–‡é’ˆå¯¹ä¿¡æ¯æ‹‰å–å’Œä¿¡æ¯æŽ¨é€ä¸¤ç±»æœåŠ¡æ¨¡å¼æå‡ºçš„å¤šé¡¹è‡ªé€‚åº”æŠ€æœ¯çš„å¯è¡Œæ€§ã€‚æ–‡ç« æœ€åŽå¯¹æœ¬æ–‡çš„ç ”ç©¶å·¥ä½œè¿›è¡Œäº†æ€»ç»“å’Œå±•æœ›ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ The rapid development of web technology greatly enriches accessible information resources. However, these resources come with some inherent insufficiencies such as disorder and mixture of junk, making user acquisition of information difficult. The Web Information Acquistion Service (WIAS) means to provide users with Web information products and services to meet their personal network information needs through modern information technology, with pull and push being the main two strategies. Adaptive techniques for WIAS adjust the service behavior to usersâ€™ information needs, information source characteristics, system load and other factors dynamically, and provide high quality information efficiently and humanizedly.Accurate and complete understanding of usersâ€™ information needs lays foundations of WIAS. Web users are simultaneously consumers and producers of Web information, therefore it is feasible to obtain usersâ€™ needs through the analysis of their browsing content, behavior and also published information and etc. Once the informaion needs are obtained, retrieving relevant results from the vast amount of Web resources and then presenting them in a more humanized style are keys to the success of WIAS. Besides, as users usually require high time validity on information acquisition, ensuring the performance of WIAS shall also be a vital part of the research on information acquistion.To address the above issues, an adaptive information pull technique based on the measurement of user requestsâ€™ ambiguity is firstly proposed. The demonstration styles of pulling results are decided adaptively according to the quantified ambiguity of user requests. For result filtering and demonstration styles, a ranking algorithm and a clustering algorithm based on the combination of multi-features are proposed correspondingly. These two algorithms are validated using two kinds of respresentive emerging Internet resources: multimedia resources (images for example in the paper) and dynamic resources with frequent updating (blog for example in the paper). Secondly, an adaptive information push technique is proposed based on user modeling for information publishers and browsers. Blogs, the popular personal information publishing platform, are taken as the research environment for information publishers and a modeling approach using blog posts is proposed, based on which communities of bloggers with similar preferences in the blogspace are partitioned and recommended as friends. Meanwhile, for information browers, current browsing content is regarded as the evidences for usersâ€™ profiles and a contextual advertising method based on sentiment and topic analysis is proposed, which ensures the promoted advertisments are not only topic relevant but also conformable the underlying usersâ€™ attitudes and therefore makes them more targeted.After then, we propose a hybrid strategy to distributed index organization in search engine (a typical information pull application), which named Loc-Glob. It is both high performance and scalable. Some optimization strategies are proposed on Loc-Glob further. To smooth the workload across index servers, index is re-distributed and duplicated based on the analysis of index terms workload and user query streams. Query path across index servers is also optimized based on the real-time workload to improve system load-balancing level.Based on the above work, a blog information acquistion prototype system adopting adaptive techniques is designed and implemented. This system provides novel applications such as blog search engine, blog friends recommending, advertisement promoting and etc. to validate the feasibility of the adaptive techniques proposed in this paper for the two types of information acquistion services.Finally, conclusions and future work are presented.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ ç½‘ç»œä¿¡æ¯èŽ·å–ï¼› è‡ªé€‚åº”æŠ€æœ¯ï¼› ä¿¡æ¯æ‹‰å–ï¼› ä¿¡æ¯æŽ¨é€ï¼› æŸ¥è¯¢æ§ä¹‰æ€§ï¼› ä¸ªæ€§åŒ–å»ºæ¨¡ï¼› åˆ†å¸ƒå¼ç´¢å¼•ç»„ç»‡ç–ç•¥ï¼›
ã€Key wordsã€‘ Web Information Acquisitionï¼› Adaptive Techniqueï¼› Information Pullï¼› Information Pushï¼› Query Ambiguityï¼› User Modelingï¼› Indexing Organization Strategyï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ æµ™æ±Ÿå¤§å¦

ã€åˆ†ç±»å·ã€‘G250.73
ã€è¢«å¼•é¢‘æ¬¡ã€‘2
ã€ä¸‹è½½é¢‘æ¬¡ã€‘711
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

è‡ªé€‚åº”ç½‘ç»œä¿¡æ¯èŽ·å–æœåŠ¡æŠ€æœ¯ç ”ç©¶

Research on Adaptive Techniques for Web Information

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

è‡ªé€‚åº”ç½‘ç»œä¿¡æ¯èŽ·å–æœåŠ¡æŠ€æœ¯ç ”ç©¶