Exploring State-of-the-Art Natural Language Processing Models with Regards to Matching Job Adverts and Resumes

Detta är en Uppsats för yrkesexamina på avancerad nivå från Uppsala universitet/Avdelningen för systemteknik

Sammanfattning: The ability to automate the process of comparing and matching resumes with job adverts is a growing research field. This can be done through the use of the machine learning area Natural Language Processing (NLP), which enables a model to learn human language. This thesis explores and evaluates the application of the state-of-the-art NLP model, SBERT, on the task of comparing and calculating a measure of similarity between extracted text from resumes and adverts. This thesis also investigates what type of data that generates the best performing model on said task. The results show that SBERT quickly can be trained on unlabeled data from the HR domain with the usage of a Triplet network, and achieves high performance and good results when tested on various tasks. The models are shown to be bilingual, can tackle unseen vocabulary and understand the concept and descriptive context of entire sentences instead of solely single words. Thus, the conclusion is that the models have a neat understanding of semantic similarity and relatedness. However, in some cases the models are also shown to become binary in their calculations of similarity between inputs. Moreover, it is hard to tune a model that is exhaustively comprehensive of such diverse domain such as HR. A model fine-tuned on clean and generic data extracted from adverts shows the overall best performance in terms of loss and consistency.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)