Stylometric Embeddings for Book Similarities

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Stylometry is the field of research aimed at defining features for quantifying writing style, and the most studied question in stylometry has been authorship attribution, where given a set of texts with known authorship, we are asked to determine the author of a new unseen document. In this study a number of lexical and syntactic stylometric feature sets were extracted for two datasets, a smaller one containing 27 books from 25 authors, and a larger one containing 11,063 books from 316 authors. Neural networks were used to transform the features into embeddings after which the nearest neighbor method was used to attribute texts to their closest neighbor. The smaller dataset achieved an accuracy of 91.25% using frequencies of 50 most common functional words, dependency relations, and Part-of-speech (POS) tags as features, and the larger dataset achieved 69.18% accuracy using a similar feature set with 100 most common functional words. In addition to performing author attribution, a user test showed the potentials of the model in generating author similarities and hence being useful in an applied setting for recommending books to readers based on author style. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)