  1. 1. Speech Emotion Recognition from Raw Audio using Deep Learning

    Master-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Jonathan Rintala; [2020]
    Nyckelord :Speech Emotion Recognition; Feature Learning; Deep Learning; Audio; SER; CNN; LSTM; Känsloigenkänning; Djupinlärning; Ljud; SER; CNN; LSTM;

    Traditionally, in Speech Emotion Recognition, models require a large number of manually engineered features and intermediate representations such as spectrograms for training. However, to hand-engineer such features often requires both expert domain knowledge and resources.

  2. 2. Automatic Speech Recognition in Somali

    Master-uppsats, Linköpings universitet/Statistik och maskininlärning

    Författare :Naveen Gabriel; [2020]
    Nyckelord :automatic speech recognition; speaker adaptation; generative training; gaussian mixture model; kaldi; finite-state transducers;

    The field of speech recognition during the last decade has left the research stage and found its way into the public market, and today, speech recognition software is ubiquitous around us. An automatic speech recognizer understands human speech and represents it as text.

  3. 3. Face recognition and speech recognition for access control

    M1-uppsats, Högskolan i Halmstad/Akademin för informationsteknologi; Högskolan i Halmstad/Akademin för informationsteknologi

    Författare :Thao Tran; Nathalie Tkauc; [2019]
    Nyckelord :Face recognition; speech recognition;

    This project is a collaboration with the company JayWay in Halmstad. In order to enter theoffice today, a tag-key is needed for the employees and a doorbell for the guests. If someonerings the doorbell, someone on the inside has to open the door manually which is consideredas a disturbance during work time.

  4. 4. Teknik för dokumentering avmöten och konferenser

    Kandidat-uppsats, KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Författare :Milan Stojanovic; [2019]
    Nyckelord :Speech-to-text; Speaker recognition; Software development; Architechture; Design; Conference tool; Tal-till-text; Talarigenkänning; Systemutveckling; Arkitektur; Design; Konferensverktyg;

    Documentation of meetings and conferences is performed at most companies by one or more people sitting at a computer and typing what has been said during the meeting. This may lead to typing mistakes or incorect perception by the person who records. The human factor is quite large.

  5. 5. Implementation of an 8-bit Dynamic Fixed-Point Convolutional Neural Network for Human Sign Language Recognition on a Xilinx FPGA Board

    Master-uppsats, Lunds universitet/Institutionen för elektro- och informationsteknik

    Författare :Ricardo Núñez-Prieto; [2019]
    Nyckelord :Artificial Intelligence; Computer Vision; Machine Learning; Convolutional Neural Networks; FPGA; Sign Language Recognition; Technology and Engineering;

    The goal of this thesis work is to implement a convolutional neural network on an FPGA device with the capability of recognising human sign language. The set of gestures that the neural network can identify has been taken from the Swedish sign language, and it consists of the signs used for representing the letters of the Swedish alphabet (a.k.a.