Using SVMs for Filtering Translation Tables

Translation Lexicons are known to improve the quality of parallel corpora alignment at sub-sentence granularity, the quality of newly extracted translations, and as a consequence, Machine Translation and cross language information retrieval. Bilingual pairs (entries) that are part of such translation lexicons should be correct if they are to contribute positively to the improvement of application's quality. This paper proposes and focuses on a method for classifying bilingual entries that were automatically extracted from aligned parallel corpora as correct or incorrect, by using a Support Vector Machine based classifier. Experimental results demonstrate that the classification approach enabled a Micro f-measure higher than 85% for language pair English-Portuguese. Keywords: Translation equivalents, Translation Lexicon, Translation tables, Bilingual translation pairs, Phrase table Filtering, Classification, Support Vector Machine, SVM

@ Proceedings of the 15th Portuguese Conference in Arificial Intelligence, EPIA 2011, Lisbon, October, 2011, Proceedings

Editors: Rui Prada, Sofia Pinto and Luís Antunes

Series: ISBN: 978-989-95618-4-7

Publisher: Instituto Superior Técnico ( Portugal )

Pages: 690 to 702

Date: October, 2011


