Identifying Bilingual Segments for Translation Generation

We present an approach that uses known translation forms in a validated bilingual lexicon and identifies bilingual stem and suffix segments. By applying the longest sequence common to pair of orthographically similar translations we initially induce the bilingual suffix transformations (replacement rules). Redundant analyses are discarded by examining the distribution of stem pairs and associated transformations. Set of bilingual suffixes conflating various translation forms are grouped. Stem pairs sharing similar transformations are subsequently clustered which serves as a basis for the generative approach. The pri- mary motivation behind this work is to eventually improve the lexicon coverage by utilising the correct bilingual entries in suggesting translations for OOV words. In the preliminary results, we report generation results, wherein, 90% of the generated translations are correct. This was achieved when both the bilingual segments (bilingual stem and bilingual suffix) in the bilingual pair being analysed are known to have occurred in the training data set.

@ Advances in Intelligent Data Analysis XIII, 13th International Symposium, IDA 2014, Leuven, Belgium, October 30 -- November 1, 2014. Proceedings

Editors: Blockeel, Hendrik, van Leeuwen, Matthijs, Vinciotti, Veronica (Eds.)

Series: Lecture Notes in Computer Science

Number: 8819

Publisher: Springer Berlin Heidelberg ( Germany )

Pages: 167 to 178

Date: October, 2014


