èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽSVMå’ŒTSVMçš„ä¸æ–‡å®žä½“å…³ç³»æŠ½å–

SVM and TSVM Based Chinese Entity Relation Extraction

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ å¾èŠ¬ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ å›½é˜²ç§‘å¦æŠ€æœ¯å¤§å¦ ï¼Œ è®¡ç®—æœºç§‘å¦ä¸ŽæŠ€æœ¯ï¼Œ 2007ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ ä¿¡æ¯æŠ½å–æŠ€æœ¯è‡ªåŠ¨å°†æ— ç»“æž„æ–‡æœ¬è½¬åŒ–ä¸ºæœ‰ç»“æž„æ–‡æœ¬,æ—¢å¯ä»¥è‡ªæˆç³»ç»Ÿæ»¡è¶³äººä»¬çš„å¼ºçƒˆéœ€æ±‚,åŒæ—¶è¿˜æ˜¯å…¶å®ƒåº”ç”¨å¦‚ä¿¡æ¯æ£€ç´¢ã€æ–‡æœ¬åˆ†ç±»ã€è‡ªåŠ¨é—®é¢˜å›žç”ç‰çš„é‡è¦åŸºç¡€æŠ€æœ¯ã€‚å®žä½“å…³ç³»æŠ½å–æ˜¯ä¿¡æ¯æŠ½å–æŠ€æœ¯ä¸çš„é‡è¦çŽ¯èŠ‚,æ£æˆä¸ºè¶Šæ¥è¶Šçƒé—¨çš„ç ”ç©¶è¯¾é¢˜ã€‚ä¸æ–‡å®žä½“å…³ç³»æŠ½å–å·¥ä½œå°šå¤„äºŽèµ·æ¥é˜¶æ®µ,è¿˜æœ‰å¤§é‡çš„å·¥ä½œéœ€è¦å®Œæˆã€‚æœ¬æ–‡é’ˆå¯¹ä¸æ–‡å®žä½“å…³ç³»çš„ç‰¹ç‚¹,è®¾è®¡äº†ä¸€ç³»åˆ—çš„ç‰¹å¾,åŒ…æ‹¬è¯ã€è¯æ€§æ ‡æ³¨ã€å®žä½“å±žæ€§å’ŒæåŠä¿¡æ¯ã€å®žä½“é—´äº¤è¿å…³ç³»å’ŒçŸ¥ç½‘æä¾›çš„æ¦‚å¿µä¿¡æ¯ç‰,ä»¥æž„æˆå®žä½“é—´å…³ç³»çš„ä¸Šä¸‹æ–‡ç‰¹å¾å‘é‡å¹¶ä½¿ç”¨SVMåˆ†ç±»å™¨è¿›è¡Œä¸æ–‡å®žä½“å…³ç³»æŠ½å–ã€‚ä»¥ACE2004çš„è®ç»ƒè¯æ–™ä½œä¸ºå®žéªŒæ•°æ®,å¾—åˆ°äº†è¾ƒå¥½çš„è¯†åˆ«æ€§èƒ½ã€‚åŒæ—¶æ ¹æ®åˆ†çº§å®žéªŒçš„ç»“æžœ,è¯¦ç»†è€ƒå¯Ÿäº†å„ç§ç‰¹å¾é›†å’Œä¸åŒè®ç»ƒæ ·ä¾‹æ•°ç›®å¯¹ä¸æ–‡å®žä½“å…³ç³»æ€§èƒ½çš„å½±å“ã€‚å®žéªŒç»“æžœè¡¨æ˜Ž:ä¸åŒç»†åŒ–ç¨‹åº¦çš„ä»»åŠ¡åº”è¯¥é€‰å–ä¸åŒæŠ½è±¡ç¨‹åº¦ç‰¹å¾é›†ç»„åˆã€‚å…¶ä¸è¯æ€§ç‰¹å¾é›†è¾ƒé€‚åˆå…³ç³»å‘çŽ°ä»»åŠ¡,çŸ¥ç½‘æ¦‚å¿µç‰¹å¾é›†è¾ƒé€‚åˆå…³ç³»å¤§ç±»å’Œåç±»è¯†åˆ«ä»»åŠ¡,è¯ç‰¹å¾é›†æ˜¯æœ€åŸºæœ¬ç‰¹å¾é›†,å®žä½“é—´äº¤è¿ç‰¹å¾é›†å¯¹æŠ½å–æ€§èƒ½è´¡çŒ®æœ€å¤§ã€‚è®ç»ƒè¯æ–™åº“è§„æ¨¡çš„å¢žåŠ å¯ä»¥æé«˜è¯†åˆ«æ€§èƒ½,å¼€å‘è¾ƒå¤§è§„æ¨¡çš„è®ç»ƒè¯æ–™åº“å¯¹ä½¿ç”¨SVMåˆ†ç±»å™¨æ˜¯å¾ˆæœ‰å¿…è¦çš„;ä½†å½“è¯æ–™åº“è¾¾åˆ°ä¸€å®šè§„æ¨¡åŽ,è¯æ–™åº“è§„æ¨¡çš„å¢žåŠ å¯¹æ€§èƒ½çš„å½±å“å˜å¼±,è¿™æ—¶åˆ™åº”è¯¥æŠŠä¸»è¦çš„æ³¨æ„åŠ›æ”¾åœ¨ç‰¹å¾é›†æž„é€ ä¸Šã€‚åœ¨ä¸Šè¿°ç ”ç©¶çš„åŸºç¡€ä¸Š,é’ˆå¯¹SVMå¯¹å¤§è§„æ¨¡è®ç»ƒè¯æ–™åº“çš„ä¾èµ–,å°†åŠç›‘ç£å¦ä¹ æ–¹æ³•TSVMå¼•å…¥åˆ°ä¸æ–‡å®žä½“å…³ç³»æŠ½å–å·¥ä½œä¸ã€‚å®žéªŒç»“æžœæ˜¾ç¤º,åœ¨è®ç»ƒå‘é‡æ•°ç›®éžå¸¸å°æ—¶TSVMçš„æ€§èƒ½è¿œè¿œè¶…è¿‡SVM,ä½†åœ¨è®ç»ƒå‘é‡æ•°ç›®è¾ƒå¤§åŽ,TSVMçš„æ€§èƒ½åè€Œä¸å¦‚SVMã€‚åœ¨å…³ç³»å‘çŽ°è¿™æ ·ç›¸å¯¹ç®€å•çš„é—®é¢˜ä¸Š,TSVMåˆ†ç±»å™¨ä»…ä½¿ç”¨å°‘é‡æ ‡æ³¨è¯æ–™å’Œå¤§é‡æœªæ ‡æ³¨è¯æ–™,å°±å¯ä»¥å¾—åˆ°ä¸é”™çš„æ€§èƒ½,é™ä½Žäº†æŠ½å–ç³»ç»Ÿçš„æˆæœ¬ã€æ”¹å–„äº†å…¶å¯ç§»æ¤æ€§;ä½†åœ¨æ›´å¤æ‚çš„å…³ç³»ç±»åˆ«è¯†åˆ«é—®é¢˜ä¸Š,TSVMåˆ†ç±»å™¨çš„æ€§èƒ½ä»ä¸ç”šç†æƒ³,åº”è¯¥è€ƒè™‘æ›´å¤šå…¶ä»–çš„åŠç›‘ç£å¦ä¹ æ–¹æ³•ã€‚åŒæ—¶æœ¬æ–‡ç ”ç©¶å¹¶å®žçŽ°äº†TSVMå¤šåˆ†ç±»å™¨æž„é€ ã€‚è¿›ä¸€æ¥çš„å·¥ä½œåŒ…æ‹¬ä¸¤ä¸ªæ–¹é¢,ä¸€æ˜¯æ”¹å–„çŽ°æœ‰çš„ç‰¹å¾é›†å¦‚å°†æ›´å¤šçš„ç‰¹å¾å¦‚ç»„å—è¯†åˆ«ã€çŸ¥ç½‘æ¦‚å¿µç»“æž„ç‰åŠ å…¥åˆ°ç‰¹å¾é›†ä»¥æé«˜å…³ç³»æŠ½å–æ€§èƒ½å’Œè¿›è¡Œæ›´ç²¾ç¡®çš„å‚æ•°é€‰æ‹©,äºŒæ˜¯å®šé‡ç ”ç©¶æ ‡æ³¨æ•°æ®çš„é€‰æ‹©å¯¹æ€§èƒ½çš„å½±å“ä»¥åŠSVMå’ŒTSVMè¦æ±‚çš„æ ‡æ³¨æ•°æ®è§„æ¨¡è§„å¾‹ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Information Extraction Technology automatically transforms unstructured texts into structured ones, which not only forms a system to satisfy the strong request, but also affords a basis for other applications such as Information Retrieval, Text Category, Question Answering. Entity Relation Extraction is so important in Information Extraction that it receives more and more interest from researchers. The task of Chinese entity relation extraction still needs much further study, calling for a mass of work.This paper presents the work of Chinese entity relation extraction. We have designed the context vector by using several new features including word, part of speech tag, entity and mention, overlap and HowNet concepts. Based on the context information, we apply an SVM classifier to detect and classify the relations between entities. We take the training data of ACE 2004 as our experimental data and have obtained encouraging results. The experimental results are analyzed in detail, which helps us investigate the impact of various features and training example quantities on the extraction performance. The experimental results indicate: it would be advisable to choose different features for different extraction task. The word features are suitable for relation detection task, while Hownet concept features are appropriate for relation type and subtype characterization tasks. Word features is a basic one and overlap features contribute most. The performance will rise with the increasement of training examples, so it will be necessary to develop large corpus if you want to use SVM classifier. But after the amount of corpus achieves certain level, the gain from adding more training examples is so trivial that we must find other way to enhance extraction performance, developing more features for instance.Aiming at the dependence of SVM method on large scale corpus, we propose the introduction of semi-supervised learning method TSVM to relation extraction. to see whether it can improve the extraction performance by using both labeled and unlabeled datum. Results from experiments show that: TSVM performs much better than SVM in the same context when labeled examples are very few, while SVM performs little better than TSVM when there are many labeled examples. TSVM can perform well on relation detection task, which makes it practicable on this kind of task. But on the task of relations type recognition, TSVM perfoms not very good, forcing us to look for other semi-superisved learning methods. An multi-TSVM classifier is also constructed.Future works include developing more features such as chunking information, Hownet concept structure to improve the extraction performance, choosing parameters for the classifier and invesigating the rule of example quantities needed by SVM and TSVM.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ ä¿¡æ¯æŠ½å–ï¼› å®žä½“å…³ç³»æŠ½å–ï¼› SVMï¼› TSVMï¼› ç‰¹å¾é€‰å–ï¼› è®ç»ƒæ ·ä¾‹æ•°ç›®ï¼› å¤šåˆ†ç±»å™¨ï¼›
ã€Key wordsã€‘ information extractionï¼› entity relation extractionï¼› SVMï¼› TSVMï¼› feature selectionï¼› training example quantitiesï¼› multi-TSVMï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ å›½é˜²ç§‘å¦æŠ€æœ¯å¤§å¦

ã€åˆ†ç±»å·ã€‘TP391.1
ã€è¢«å¼•é¢‘æ¬¡ã€‘9
ã€ä¸‹è½½é¢‘æ¬¡ã€‘461
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

åŸºäºŽSVMå’ŒTSVMçš„ä¸­æ–‡å®žä½“å…³ç³»æŠ½å–

SVM and TSVM Based Chinese Entity Relation Extraction

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

åŸºäºŽSVMå’ŒTSVMçš„ä¸æ–‡å®žä½“å…³ç³»æŠ½å–