Friends character classification personality quiz

Detta är en Kandidat-uppsats från Uppsala universitet/Avdelningen för systemteknik

Sammanfattning: The purpose of this project was to create a personality quiz based on the scientifical method and machine learning that determine which character in the TV-series Friends that the person taking the quiz is the most similar to. The manuscripts from all the episodes were used to extract features and create a split training/test dataset. Different models from the python modules SKLearn and Pytorch, such as neural networks, quadratic discriminant analysis and decision trees, were then trained and evaluated based on their accuracy on the test data and performance metrics such as Akaike information criterion, k- Fold cross-validation and principal component analysis Although gradient boosting provided the highest accuracy with 78%, logistic regression (with an accuracy of 59% after k-fold cross validation) was chosen as the method to base the quiz on since it provided an easier way to distinguish what features were important to which character by analyzing the coefficients of the model. Amongst the features in the training dataset were the 100 most common words for each character and how many times they were spoken. Some of these words were common for all the characters and selected to form a foundation for 10 yes or no questions. The coefficients of the corresponding words for each character were extracted from the logistic regression model and used as weights for the quiz. A normalization was made so that the weights so that each character’s score had a maximum of 100, the quiz was then uploaded and hosted on the website quiz-maker.com. To conclude that only selecting some of the original features did not impair the model's accuracy, a final model was trained with just the words used in the quiz as features. This model had an accuracy of 55%.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)