Sökning: "Simon Hellsten"

Hittade 1 uppsats innehållade orden Simon Hellsten.

  1. 1. Incremental Re-tokenization in BPE-trained SentencePiece Models

    Kandidat-uppsats, Umeå universitet/Institutionen för datavetenskap

    Författare :Simon Hellsten; [2024]
    Nyckelord :BPE; Byte Pair Encoding; SentencePiece; NLP; Natural Language Processing; Tokenization; Re-tokenization;

    Sammanfattning : This bachelor's thesis in Computer Science explores the efficiency of an incremental re-tokenization algorithm in the context of BPE-trained SentencePiece models used in natural language processing. The thesis begins by underscoring the critical role of tokenization in NLP, particularly highlighting the complexities introduced by modifications in tokenized text. LÄS MER