Detecting Plagiarism Patterns in student code

Detta är en Kandidat-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Författare: Johan Henning; Nicolai Hellesnes; [2019]

Nyckelord: ;

Sammanfattning: Plagiarism has become a big concern in programming both in education and in the industry of software development. While a lot of effort have been put into detecting plagiarism, most of the it have been focused on detecting plagiarism in plain text. The methods for cheating has evolved as plagiarism detection has improved. This thesis looks at plagiarism in entry level programming courses to discover how wide spread the cheating is, and if plagiarism detection algorithms in conjunction with metadata from GitHub can be used to better detect cheating. More specifically the commit metadata from GitHub is used to see if any interesting patterns with students who plagiarize can be found. The dataset used in this thesis are GitHub repositories for the entry level programming courses DD1337 and DD1338 for the year of 2015. The data set consists of 17 programming assignments with around 200 student submissions per assignment. The plagiarism detection tools used were MOSS and for each week the 10 most suspicious submitted assignments were added to a suspicious-list which were later used to help find patterns in students that plagiarize. The results show that the suspicious students on average had 5.27 commits per assignment, while the non-suspicious students had 6.49 commits on average per assignment. This is to say that suspicious students on average had a lower number of commits than the non-suspicious students. Future work includes testing with bigger data sets, and testing other metadata for finding other interesting patterns in cases of plagiarism.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)