èŠ‚ç‚¹æ–‡çŒ®

XML Engineå®‰å…¨ç½‘å…³è¯ä¹‰è¿‡æ»¤çš„ç ”ç©¶ä¸Žå®žçŽ°

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ å´çº¢å¨Ÿï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ ç”µåç§‘æŠ€å¤§å¦ ï¼Œ è½¯ä»¶å·¥ç¨‹ï¼Œ 2009ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ åœ¨åºžæ‚çš„äº’è”ç½‘ä¿¡æ¯ä¸,ä¸è‰¯ä¿¡æ¯ä»¥å„ç§ä¸åŒçš„æ–¹å¼,é€šè¿‡å¤šç§é€”å¾„ä»Žä¸åŒçš„æ–¹é¢å¯¹äººä»¬é€ æˆäº†ä¸è‰¯å½±å“ã€‚å› æ¤,å¿…è¦å’Œæœ‰æ•ˆçš„ä¸è‰¯ä¿¡æ¯è¿‡æ»¤å¯¹äºŽå»ºè®¾å¥åº·ã€å®‰å…¨çš„äº’è”ç½‘çŽ¯å¢ƒæ˜¾å¾—å°¤ä¸ºé‡è¦ã€‚ä½†æ˜¯,ä¼ ç»Ÿçš„æ–‡æœ¬ä¿¡æ¯è¿‡æ»¤ç®—æ³•ä»…èƒ½ä»Žç»“æž„å¯¹åº”çš„å±‚æ¬¡ä¸Šè¿›è¡Œåˆ¤æ–,è€Œæ— æ³•å®žçŽ°æ–‡æœ¬çš„è¯ä¹‰,å¾ˆéš¾æ»¡è¶³å½“ä»Šä¿¡æ¯æ™ºèƒ½åŒ–çš„è¦æ±‚ã€‚æœ¬è¯¾é¢˜ç»“åˆè®¡ç®—è¯è¨€å¦çŸ¥è¯†,æå‡ºå¹¶å®žçŽ°äº†ä¸€ç§è¯ä¹‰åˆ†æžçš„è¿‡æ»¤æ–¹æ³•,å¯¹äºŽé‚£äº›ä¸èƒ½é€šè¿‡å…³é”®å—åŒ¹é…è¿‡æ»¤è€Œæ¼æŽ‰çš„é•¿æ–‡æœ¬ä¿¡æ¯,é€šè¿‡è¯ä¹‰åˆ†æž,å¯ä»¥è¿›è¡Œå¾ˆå¥½åœ°é‰´åˆ«å¤„ç†,ä»Žè€Œæœ‰æ•ˆçš„é˜²æ¢å¤§é‡ä¸è‰¯åžƒåœ¾ä¿¡æ¯çš„æ•£æ’ã€‚æœ¬è¯¾é¢˜çš„å…ˆè¿›æ€§å¦‚ä¸‹:1ã€é’ˆå¯¹å„ç§è‡ªåŠ¨åˆ†è¯æ–¹æ³•ä¸å‡ºçŽ°çš„é—®é¢˜,æ”¹è¿›äº†å…·æœ‰è‡ªå¦ä¹ æœºåˆ¶çš„æ™ºèƒ½è¯å…¸çš„æ¦‚å¿µ,å¹¶å®žçŽ°äº†æ™ºèƒ½è¯å…¸çš„åŸºæœ¬æ¨¡åž‹ã€‚è¯¥æ¨¡åž‹åœ¨åˆ†è¯çš„åŒæ—¶,å®žçŽ°äº†å¯¹æ–°è¯çš„è‡ªå¦ä¹ åŠŸèƒ½,ä¸éœ€è¦äººå·¥å¹²é¢„,å¾ˆå¥½åœ°å®Œæˆäº†ç³»ç»Ÿçš„æ™ºèƒ½æ€§ã€‚åˆ†è¯ç®—æ³•é‡‡ç”¨æ£å‘å’Œé€†å‘æœ€å¤§åŒ¹é…æ–¹æ³•ç›¸ç»“åˆçš„ç‰¹ç‚¹,åˆ†è¯çš„å‡†ç¡®çŽ‡å¤§å¤§æé«˜,åŒæ—¶,é…åˆè¯é¢‘åº“,èƒ½å¤Ÿæœ‰æ•ˆåœ°æ¶ˆè§£åˆ†è¯æ§ä¹‰,ä¹Ÿæ˜¯å¯¹åˆ†è¯å‡†ç¡®çŽ‡çš„è¿›ä¸€æ¥ä¿è¯ã€‚2ã€é€šè¿‡å¯¹ç‰¹å¾å€¼ç®—æ³•çš„æ·±å…¥ç ”ç©¶,åŸºäºŽTFIDFçš„ç‰¹å¾å€¼æå–ç®—æ³•,åœ¨TFIDFç¨³å®šæ€§çš„åŸºç¡€ä¸Šå¼•å…¥è¯æ€§ç³»æ•°æ¥æ”¹å–„ç‰¹å¾é›†çš„é€‰å–æ•ˆæžœã€‚é‡‡ç”¨æ½œåœ¨è¯ä¹‰æ ‡æ³¨çš„æ–¹æ³•,å¯¹ä¸åŒè¯æ€§çš„ç‰¹å¾ä¹˜ä»¥ä¸åŒçš„è¯æ€§ç³»æ•°,çªå‡ºä¸åŒè¯æ€§çš„ç‰¹å¾è¡¨ç¤ºæ–‡æ¡£ç±»åˆ«çš„èƒ½åŠ›,ä»¥å‡è½»æ–‡æœ¬åˆ†ç±»å™¨çš„å·¥ä½œé‡,è¿›ä¸€æ¥æé«˜å¤„ç†çš„é€Ÿåº¦å’Œæ•ˆæžœã€‚3ã€é€šè¿‡å¯¹å‡ ç§ä¸»è¦çš„åˆ†ç±»å™¨ç®—æ³•çš„ç ”ç©¶,ä¾æ®è´å¶æ–¯ç®—æ³•æ€§èƒ½é«˜,å¤æ‚åº¦ä½Žçš„ç‰¹ç‚¹,å¹¶é’ˆå¯¹é¡¹ç›®çš„å®žé™…æƒ…å†µ,æ‰¹é‡å¤§ã€é€Ÿåº¦å¿«ã€åˆ†ç±»ç§ç±»å°‘çš„ç‰¹ç‚¹,æå‡ºä¸€å¥—åŸºäºŽæœ´ç´ è´å¶æ–¯ç®—æ³•çš„åˆ†ç±»å™¨æ¨¡åž‹,åˆ©ç”¨ç‰¹å¾å€¼çš„è¯æ€§ç³»æ•°,åˆ©ç”¨ç»Ÿè®¡æ–¹æ³•å¯¹å¾…åˆ†ç±»æ–‡æœ¬è¿›è¡Œè®ç»ƒåˆ†ç±»ã€‚è¯•éªŒè¯æ˜Ž,è¯¥åˆ†ç±»å™¨ç®—æ³•å…·æœ‰å¾ˆé«˜çš„æŸ¥å…¨ä¸ŽæŸ¥å‡†çŽ‡,ä¸ºæ•´ä¸ªè¯ä¹‰è¿‡æ»¤æ¨¡å—çš„è¿‡æ»¤è´¨é‡æä¾›äº†æœ‰æ•ˆçš„ä¿éšœã€‚è®ºæ–‡ç ”ç©¶æˆæžœå·²ç»åº”ç”¨åˆ°å›½å®¶æ”¯æ’‘è®¡åˆ’ã€å¹¿ä¸œçœç§‘æŠ€é¡¹ç›®XML Engineå®‰å…¨ç½‘å…³ä¸Šã€‚åœ¨æ•´ä¸ªXML Engineä¸åŠ å…¥æœ¬è¯¾é¢˜çš„è¯ä¹‰è¿‡æ»¤æ¨¡å—,æžå¤§çš„é˜»æ¢äº†å¯¹å¤§é‡ä¸è‰¯ä¿¡æ¯çš„æ™ºèƒ½è¿‡æ»¤,è¿›ä¸€æ¥ä¿è¯äº†æ•´ä¸ªXML Engineçš„å®‰å…¨æ€§èƒ½ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Among the large quantity of complicated Internet information, some ill pieces have bad effects on many people in several different ways and from kinds of aspects. Therefore, necessary and effective filtrating for visiting network is an important aspect of setting up a healthy and safe network environment. However, the traditional methods of text message filter can only judge the layers according to the structure, but not the semantic of the text, which are hard to meet the needs of the intelligentialization.by combinating computational linguistics susbject konwledge, this article proposed and implemented a emantic analysis of filtering methods. For the long text message, that can not be filtered out by keword matching,we can do a better identification and processing through the semantic analysis,so as to ffectively prevent a large number of non-meaning infromation spreaded out.The advanced point of this thesis is mentioned as following: First, aiming at the problems of some word segmentation methods, the concept of intellective dictionary of auto-study protocol is improved, and the basic model of intellective dictionary is archived. This model archives the auto-study function of new words without human being interrupting, and realizes the intellective quality of system. This word segmentation algorithm combines the positive and negative direction max matching, which improves the accuracy of word segmentation. Meanwhile, according to the words frequency library, the algorithm can remove the different meanings of word segmentation, which ensures the accuracy of word segmentation. Second, through the research of the characteristic value algorithm deep, the distilling algorithm of characteristic value based on TFIDF, which imports word property coefficient to improve the characteristic set based on the stability the TFIDF. This algorithm uses the method of latent semantic label to help user analyze the semantic relationship, which multiplies different word property coefficient for different word characteristic. The advantage is highlighting the ability of special position expressing the sort of document, in order to relief the workload of word segmentation, and improve the speed of effective of treatment. Third, through the research of several main categorizer algorithm, based on Bayes algorithm, which has high quality and low complexity, aiming at the characteristic of big batches, fast speed and few sorts of projects, a set of Classifier models of Bayes algorithm is introduced, which uses the word characteristic coefficient and statistic method to sort for the relative degree. The experiment shows that, this categorizer algorithm has the ability of high comprehensive and exact search, which support effective guarantee for the filter quality of all the semantic filter module.The result of the thesis research has already been used in the XML Engine safe gateway, which is the technology project of Guangdong, with national support. Adding the semantic filter module to the whole XML Engine, prevents the intellective filtrating of quantity of bad information, and assures the safe quality of XML Engine.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ è¯ä¹‰è¿‡æ»¤ï¼› XML Engineï¼› åˆ†è¯ï¼› æ–‡æœ¬åˆ†ç±»å™¨ï¼›
ã€Key wordsã€‘ semantic filteringï¼› XML Engineï¼› word segmentationï¼› text Classifierï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ ç”µåç§‘æŠ€å¤§å¦

ã€åˆ†ç±»å·ã€‘TP391.1
ã€è¢«å¼•é¢‘æ¬¡ã€‘1
ã€ä¸‹è½½é¢‘æ¬¡ã€‘44
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

XML Engineå®‰å…¨ç½‘å…³è¯­ä¹‰è¿‡æ»¤çš„ç ”ç©¶ä¸Žå®žçŽ°

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

XML Engineå®‰å…¨ç½‘å…³è¯ä¹‰è¿‡æ»¤çš„ç ”ç©¶ä¸Žå®žçŽ°