Evaluating and Fine-Tuning a Few-Shot Model for Transcription of Historical Ciphers

Detta är en Master-uppsats från Uppsala universitet/Institutionen för lingvistik och filologi

Sammanfattning: Thousands of historical ciphers, encrypted manuscripts, are stored in archives across Europe. Historical cryptology is the research field concerned with studying these manuscripts - combining the interest of humanistic fields with methods of cryptography and computational linguistics. Before a cipher can be decrypted by automatic means, it must first be transcribed into machine-readable digital text. Image processing techniques and Deep Learning have enabled transcription of handwritten text to be performed automatically, but the task faces challenges when ciphers constitute the target data. The main reason is a lack of labeled data, caused by the heterogeneity of handwriting and the tendency of ciphers to employ unique symbol sets. Few-Shot Learning is a machine learning framework which reduces the need for labeled data, using pretrained models in combination with support sets containing a few labeled examples from the target data set. This project is concerned with evaluating a Few-Shot model on the task of transcription of historical ciphers. The model is tested on pages from three in-domain ciphers which vary in handwriting style and symbol sets. The project also investigates the use of further fine-tuning the model by training it on a limited amount of labeled symbol examples from the respective target ciphers. We find that the performance of the model is dependant on the handwriting style of the target document, and that certain model parameters should be explored individually for each data set. We further show that fine-tuning the model is indeed efficient, lowering the Symbol Error Rate (SER) at best 27.6 percentage points. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)