Swedish biomedical text-miningand classification

Detta är en Kandidat-uppsats från KTH/Hälsoinformatik och logistik; KTH/Hälsoinformatik och logistik

Författare: Linus Eriksson; Kevin Frejdh; [2020]

Nyckelord: ;

Sammanfattning: AbstractManual classification of text is both time consuming and expensive. However, it is anecessity within the field of biomedicine, for example to be able to quantify biomedical data.In this study, two different approaches were researched regarding the possibility of usingsmall amounts of training data, in order to create text classification models that are able tounderstand and classify biomedical texts. The study researched whether a specialized modelshould be considered a requirement for this purpose, or if a generic model might suffice. Thetwo models were based on publicly available versions, one specialized to understand Englishbiomedical texts, and the other to understand ordinary Swedish texts. The Swedish modelwas introduced to a new field of texts while the English model had to work on translatedSwedish texts.The results were quite low, but did however indicate that the method with the Swedish modelwas more reliable, performing almost twice as well as the English correspondence. The studyconcluded that there was potential in using general models as a base, and then tuning theminto more specialized fields, even with small amounts of data.KeywordsNLP, text-mining, biomedical texts, classification, labelling, models, BERT, machinelearning, FIC, ICF.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)