Sentence based risk classifier using NLP and machine learning

Detta är en Kandidat-uppsats från Högskolan i Halmstad

Författare: David Tran; Hugo Starck; [2023]

Nyckelord: ;

Sammanfattning: This project was inspired by the company Dizparc and has a focus onclassification systems together with certain applications of natural languageprocessing. Classification systems are a very extensively researched areadating back to the latter half of the 1900s with multiple different ways of theproblems presented up until its more modern takes in today’s age. There aremany approaches to classification systems with applications of naturallanguage processing, some already existing ones are the combination ofword vectorization methods together with various algorithms such asWord2Vec merged with Transformers or Convolution Neural Networks.Most of the classification systems with applications of natural languageprocessing usually reside within medical research, and therefore access todata is strictly limited. This project was designed to classify inputs using themachine learning algorithms Multinomial Logistic Regression, DecisionTree, and Random Forest, and to compare the models to see which of themwould yield the best results. These results were tested based on the overallaccuracy, and difference in lowest and highest accuracy. Confusion matriceswere also used to check which classes were the easiest to predict. Thatshowed a better result for Random Forest when using certain numbers ofclasses, while Decision Tree was able to reach similar results when usingfewer classes. The quantity and quality of data accumulated may not servesufficient to correctly classify inputs through certain methods.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)