Extending a Text Classifier to Multiple Languages

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Albin Byström; [2021]

Nyckelord: Natural language processing; Multilingual; Transformer; Word embeddings; Text classification; Språkteknologi; Flerspråkig; Transformator; Ordinbäddningar; Textklassificering;

Sammanfattning: This thesis explores the possibility to extend monolingual and bilingual text classifiers to multiple languages. Two different language models are explored, language aligned word embeddings and a transformer model. The goal was to take a classifier based on Swedish and English samples and extend it to Danish, German, and Finnish samples. The result shows that extending a text classifier by word embeddings alignment or by finetuning a multilingual transformer model is possible but with varying accuracy depending on the language.

HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)

Extending a Text Classifier to Multiple Languages

Sökningar just nu

Populära sökningar

Uppsatser med många visningar igår (2024-04-27)