Perceptually meaningful time and frequency resolution in applying dialogue enhancement in noisy environments : Dialogue Enhancement research

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: Dialogue Enhancement (DE) is a process used in audio delivery systems to improve the clarity, intelligibility, and overall quality of the spoken dialogue in audio content. It is primarily used when dialogue is masked by music, surrounding noise, or other audio sources. This thesis project involves experiments to find the optimal time and frequency resolution needed for a DE system. The time resolution focuses on experimenting with various attack/release times for a DE system. The frequency domain analysis investigates whether people prefer a noise spectrum-dependent gain over a conventional full-band gain. The research methodology comprises three main parts. The first part focuses on system setup and choosing content/vectors to be used for the experiments. Next, the experiments are designed for time and frequency resolution. An exponential smoothing model is used to amplify/attenuate the dialogue stream at various times of attack/release. For the frequency counterpart, a banded gain model is designed which uses banded noise levels as input. Subsequently, a modified subjective listening test is designed to evaluate the experiments designed. The responses recorded for various types of content-noise combinations from the listeners are recorded and analyzed. Finally, the main outcome of this research emphasizes the advantages of a DE system. Further, it paves the way for further exploration of DE models and rigorous testing schemes with expert listeners.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)