Sökning: "byte pair encoding BPE"

Hittade 2 uppsatser innehållade orden byte pair encoding BPE.

  1. 1. Incremental Re-tokenization in BPE-trained SentencePiece Models

    Kandidat-uppsats, Umeå universitet/Institutionen för datavetenskap

    Författare :Simon Hellsten; [2024]
    Nyckelord :BPE; Byte Pair Encoding; SentencePiece; NLP; Natural Language Processing; Tokenization; Re-tokenization;

    Sammanfattning : This bachelor's thesis in Computer Science explores the efficiency of an incremental re-tokenization algorithm in the context of BPE-trained SentencePiece models used in natural language processing. The thesis begins by underscoring the critical role of tokenization in NLP, particularly highlighting the complexities introduced by modifications in tokenized text. LÄS MER

  2. 2. Bidirectional LSTM-CNNs-CRF Models for POS Tagging

    Master-uppsats, Uppsala universitet/Institutionen för lingvistik och filologi

    Författare :Hao Tang; [2018]
    Nyckelord :bidirectional LSTM; part of speech; CNNs; CRF; byte pair encoding BPE ;

    Sammanfattning : In order to achieve state-of-the-art performance for part-of-speech(POS) tagging, the traditional systems require a significant amount of hand-crafted features and data pre-processing. In this thesis, we present a discriminative word embedding, character embedding and byte pair encoding (BPE) hybrid neural network architecture to implement a true end-to-end system without feature engineering and data pre-processing. LÄS MER