Email classification using machine learning algorithms

Detta är en Kandidat-uppsats från Uppsala universitet/Institutionen för materialvetenskap

Sammanfattning: The goal of this project is to construct a machine learning algorithmthat improves over time. This was done by first constructing a datasetthat reflects real world messages, that would simulate receiving emailsfrom two different sources. The data set was constructed by combiningdata from two different online forums. Two application programminginterrfaces were used to collect and send data to the program. Thedataset was tested on 4 different methods where the best one would beused for the final product. The 4 different methods were: k-nearestneighbors, adaptive boosting, random forest and artificial neuralnetwork. All the above methods were tested and tuned to achieve the bestaccuracy. From the result it became clear that the artificial neuralnetwork outperformed the other methods by a large margin and would bemost suited for the final product. The final product was an algorithmthat would improve over time. This was achieved by using a feedback loopon the new data that was collected over time from the online forums. Ifthe algorithm was sure that a new datapoint was the right class it wouldincorporate it into the dataset and over time the dataset would growlarger and the algorithm would adapt to new data and trends. The finalresult became a growing dataset that started on a 1000 data points andended up at 8464 data points, where the total amount ofmisclassification ended up at 74. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)