Automatic Error Detection and Correction in Neural Machine Translation : A comparative study of Swedish to English and Greek to English

Detta är en Master-uppsats från Uppsala universitet/Institutionen för lingvistik och filologi

Författare: Anthi Papadopoulou; [2019]

Nyckelord: ;

Sammanfattning: Automatic detection and automatic correction of machine translation output are important steps to ensure an optimal quality of the final output. In this work, we compared the output of neural machine translation of two different language pairs, Swedish to English and Greek to English. This comparison was made using common machine translation metrics (BLEU, METEOR, TER) and syntax-related ones (POSBLEU, WPF, WER on POS classes). It was found that neither common metrics nor purely syntax-related ones were able to capture the quality of the machine translation output accurately, but the decomposition of WER over POS classes was the most informative one. A sample of each language was taken, so as to aid in the comparison between manual and automatic error categorization of five error categories, namely reordering errors, inflectional errors, missing and extra words, and incorrect lexical choices. Both Spearman’s ρ and Pearson’s r showed that there is a good correlation with human judgment with values above 0.9. Finally, based on the results of this error categorization, automatic post editing rules were implemented and applied, and their performance was checked against the sample, and the rest of the data set, showing varying results. The impact on the sample was greater, showing improvement in all metrics, while the impact on the rest of the data set was negative. An investigation of that, alongside the fact that correction was not possible for Greek due to extremely free reference translations and lack of error patterns in spoken speech, reinforced the belief that automatic post-editing is tightly connected to consistency in the reference translation, while also proving that in machine translation output handling, potentially more than one reference translations would be needed to ensure better results.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)